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Trademark Office (**the Office"). Furthermore, as these database citations concern ahuman genomic 
clone from chromosome 2 (GenBank Accession Number AC025750), and a cDNA clone ScomRathis 
norvegicus (GenBank Accession Number AI043703), Applicants submit that these citations are not 
germane to the presently pending claims, and are therefore are not provided. 

Second, the Examiner notes that "the Applicant has not submitted an English language 
translation or an English language abstract for document DE 1 984 1 4 1 3C" (the Action at page 2). 
Applicants note for the record that only the German patent application itself was provided by the 
European Patent Office, which is why no English language abstract was provided to the Office. 
Therefore, Applicants provide herewith a copy of the English language abstract from the published 
PCX patent application based on DE 19841413C (Exhibit A; see priority information). 

IV. Title 

The Action objects to the title of the application as allegedly "not descriptive" (the Action at 
page 3). Applicants have amended the title of the present application based on the suggestion from the 
Examiner. 

Applicants request that, since the objection has been overcome, this objection be withdrawn. 

V. Rejection of Claims 1, 3. and 5-11 Under 35 U,S.C> S 101 

The Action first rejects claims l,3,and5-ll under35U.S.C. § 101, as allegedly lacking a 
patentable utility. Applicants respectfully traverse. 

First, while Applicants in no way agree with the Examiner' s position that claims 1 , 7 and 1 0 
lack a patentable utility, as claims 1 , 7 and 1 0 have been cancelled entirely without prejudice and 
without disclaimer, the present rejection of claims 1, 7 and 10 under 35 U.S.C. § 101 is rendered 
moot. The remainder of this section will therefore focus on claims 3, 5, 6, 8, 9 and 1 1 . 

The presently claimed sequence has clearly been described by Applicants in the specification 
as originally filed as an ion channel protein (see, at least, the title of the ^plication as originally filed, and 
page 2, lines 2-4 of the specification), and more particularly a voltage-gated potassium channel protein 
(see, at least, page 2, line 5 of the specification). Additionally, Applicants respectfully point out that 
the presently claimed sequence shares 100% identity at the amino acid level over the entire length of 
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SEQ ID NO:2 with two sequences that are present in the leading scientific repository for biological 
sequence data (GenBank), which have been annotated by independent third party scientists wholly 
unaffiliated with Applicants as "Homo sapiens voltage-gated potassium chaimel subunit Kvl 0. 1 a" 
(GenBank accession number AF454547; alignment and GenBank report provided in Exhibit B), and 
*'Homo sapiens potassium voltage-gated channel subfamily G, member 3" (Kv6.3, GenBank accession 
number NM_1 72344; alignment and GenBank report provided in Exhibit C). It is well known in the 
art that the Kv 1 0. 1 a and Kv6.3 subunits are alternative names for the same protein (see page 2 from 
the GenBank report provided in Exhibit C). Furthermore, three independent groups of scientists have 
established that the presently claimed sequence specifically interacts with the well-studied Kv2. 1 
voltage-gated potassium channel subunit to form functional voltage-gated potassium ion chatmels 
(Sano etal, FEES Lett, 512:230-234, 2002 ("Sano"; copy of the abstract provided in Exhibit D); 
Ottschytsch et al.Proc. Natl Acad. Sci, USA 99:7986-7991, 2002 ("Ottschytsch"; copy of the 
manuscript provided in Exhibit E); and Vega-Saenz de Miera, Brain Res. Mol Brain Res. 
123:91-103, 2004 ("Vega-Saenz"; copy of the abstract pro vided in Exhibit F)), thus confirming 
Applicants' assertion that the presently claimed sequence, which is identical to the Kvl 0. 1 a and Kv6.3 
proteins described above, is a voltage-gated potassium ion channel protein. Applicants respectfully 
point out that whether or not the Sano, Ottschytsch, and Vega-Saenz references cited by Applicants 
above were available at the time of filing of the present appUcation is not germane to the utility issue at 
hand. Applicants point to the Sano, Ottschytsch, and Vega-Saenz references not to evidence that 
these sequences were known in the art at the time the present application was filed, but, rather, to 
evidence that other skilled artisans have confirmed Applicants' assertion that the presently claimed 
sequence is a voltage-gated potassium ion channel protein. 

The Examiner states that **the specification does not disclose disorders or conditions associated 
with a mutated, deleted, or translocated gene" (the Action at page 7). First, Applicants point out that 
the disclosure of "disorders or conditions associated with a mutated, deleted, or translocated gene" is 
not the standard forpatentabiUty under 35 U.S.C. § 101 {In reBrana, 34 USPQ2d 1436 (Fed. Cir. 
1 995); "Brana""). Second, and more importantly, AppUcants respectfiiUy point out that the presently 
claimed sequence has clearly been described by Applicants in the specification as originally filed as a 
voltage-gated ion channel protein (see, at least, page 2, line 5 of the specification) that is involved in 
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high blood pressure, arrhythmia, and diabetes (see, at least, page 1 3 , lines 1 9-22 of the specification). 
Furthermore, Applicants respectfully point out that present sequence has been shown to specifically 
modulate Kv2.1 voltage-gated potassium ion channel subunits (see Exhibits D-F), and that the 
association of Kv2. 1 voltage-gated potassium ion channel subunits and high blood pressure (Michelakis 
et al , Adv, Exp, Med. Biol 502 :40 1 -4 1 8, 200 1 ; copy of abstract provided in Exhibit G), arrhythmia 
(Lee et al , Am, J, Physiol. 277:H1 725-Hl 73 1 , 1999 (copy of article provided in Exhibit H) and 
Huang et al, J. Cardiovasc. Electrophysiol 11:1252-1261, 2000 (copy of abstract provided in 
Exhibit I)), and diabetes (MdicDondXdetal.Mol Endocrinol 15:1423-1435, 2001 (copy of article 
provided in Exhibit!) andQinet al.Biochem. Biophys, Res, Commun. 283:549-553,2001 (copy 
of abstract provided in Exhibit K)) were all well-known in the art at the time the present application 
was filed. Example 1 0 of the Revised Interim Utility Guidelines Training Materials (pages 53-55 ; 
Exhibit L), which have been set forth by the United States Patent and Trademark Office ('the 
USPTO"),clearlyestablishes that a rejection under 35 U.S.C. § 101 as allegedly lacking a patentable 
utility, and under 35 U.S.C. § 1 12, first paragraph, as allegedly unusable by the skilled artisan due to 
the alleged lack of patentable utility (see Section VI, below), is not proper when a fiiU length sequence 
(such as the presently claimed sequence) has a similarity score greater than 95 % to a protein having a 
well-estabhshed utility. Therefore, based on the 100% identity between thepresently claimed sequence 
and Kv6.3 and Kvl 0. 1 a, the established interaction between the claimed sequence and Kv2. 1 , and 
the association of Kv2 . 1 and high blood pressure, arrhythmia and diabetes, as detailed above, as the 
present situation exactly tracks Example 10 of the Revised Interim Utility Guidelines Training 
Materials, the USPTO's own examination guidelines clearlv indicate that the present claims meet the 
requirements of 35 U.S.C. § 101 and35U.S.C. § 1 12, firstparagraph(see Section VI, below). Thus, 
the present rejection of claims 3, 5, 6, 8, 9 and 1 1 should be withdrawn. 

The Examiner cites a number of articles (Wells, Biochemistry 29:8509-8517, 1990; 
Ngo etal, "The Protein Folding Problem and Tertiary Structure Prediction", pp. 492-495, 1994; 
l.Qhm2in-Rometal, Physiol Rev, 79:1317-1372, 1999; Bork, GenomeRes, 10:398-400, 2000, 
Skolnick etal. Trends in Biotech, 18:34-39, 2000; Doerks etal. Trends in Genetics 14:248-250, 
1998; Smith et al. Nature Biotechnology 15:1222-1223, 1997; Brenner, Trends in Genetics 
15:132-133, 1999; and Bork a/., Trends in Genetics 12:425-427, 1996) for the proposition that 
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"(t)he problem of predicting protein and DNA structure from sequence data and in tum utilizing 
predicted structural determinations to ascertain functional aspects of the protein and DNA is extremely 
complex" (the Action at page 8), and, thus, "the art recognizes that function cannot be predicted from 
structure alone" (the Action at page 9). Applicants suggest that such citations reflect that the Examiner 
appears to believe that extensive structural similarity is not enough to establish a specific utility. First, 
Applicants respectfully point out the Examiner's position directly contradicts the position of the 
USPTO itself, as set forth in Example 1 0 of the Revised Interim Utility Guidelines Training Materials 
(see Exhibit L), which clearly establishes that structural similarity can in fact be used to estabUsh 
function, and thus establish a specific utility. Second, rather than detail the numerous failings of each 
of the articles cited by the Examiner, Apphcants merely note for the record that scientific manuscripts 
from as far back as 1 990 can hardly be considered to reflect the state of the art at the time the present 
application was filed. Therefore, as the USPTO's own examination guidelines clearly indicate that 
structural similarity can in fact be used to establish function, and thus estabUsh a specific utility, the 
present claims meet the requirements of 3 5 U.S.C. § 101 and35U.S.C. § 11 2, first paragraph (see 
Section VI, below). Thus, the present rejection of claims 3, 5, 6, 8, 9 and 1 1 should be withdrawn. 

It has been well established that Applicants need only make one credible assertion of utility to 
meet the requirements of 35 U.S.C. § 101 (Raytheon v. Roper, 220 USPQ 592 (Fed. Cir.- 1983); 
In re Gottlieb, 140 USPQ 665 (CCPA 1964); In re Malachowski, 1 89 USPQ 432 (CCPA 1976); 
Hoffman v. Klaus, 9 USPQ2d 1657 (Bd. Pat. App. & Inter. 1988)), and, thus, any questions 
concerning whether or not the present claims meet the requirements of 35 U.S.C. § 101 shouldhave 
been laid to rest. Nevertheless, Apphcants respectfully point out that the present invention has a 
number of additional substantial and credible utilities, not the least of which is in forensic biology, as 
described in the specification as originally filed, at least at page 3, line 12. As described in the 
specification as originally filed, at page 1 7, lines 3-5, a coding single nucleotide polymorphism was 
identified in the presently claimed sequence - specifically, a silent G/C polymorphism at nucleotide 
position 432 of SEQ ID NO: 1 , both of which result in a glycine being present at amino acid position 
1 44 of SEQ ID NO:2. As suchpolymorphisms are the basis for forensic analysis, which is undoubtedly 
a **real world" utiUty, the presently claimed sequence must in itselfbe useful. Thus, the present claims 
clearly meet the requirements of 35 U.S.C. § 101 . 
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Applicants respectfiilly point out that, even though Applicants asserted in the specification as 
originally filed that the presently claimed sequence is involved in high blood pressure, arrhythmia, and 
diabetes (see above), the use of the presently described polymorphism in forensic analysis does 
not even require the identification of a specific medical condition. One aspect of forensic analysis is 
to distinguish individual members of the himian population firom one another based solely on the 
presence or absence of one or more polymorphic markers, such as the presently described 
polymorphism. As polymorphic markers such as the presently described polymorphism have been 
used in forensic analysis for decades, this is clearly a well-established technique, and as such, specific 
guidance does not need to be provided in the present specification, for it has long been estabUshed that 
a patent need not disclose what is well-known in the art {In re Wands, 8 USPQ 2d 1400 (Fed. Cir. 
1988)). Thus, the Examiner*s argument does not support the alleged lack of utility. 

This is also not a case of a "potential" utility. Using the polymorphic marker exactly as 
described in the specification as originally filed, the skilled artisan can readily distinguish individxials fi-om 
one another. Applicants respectfiilly point out that while using forensic analysis to make a positive 
identification would require information concerning the percentage of a population that contains the 
polymorphism, elimination of an individual firom a pool of suspects requires no information at all 
concerning the percentage of a population that contains the polymorphism. Applicants point out that 
in the worst case scenario, each polymorphic marker is usefiil to eliminate 50% of the population (in 
other words, the marker being present in half of the population). This is an inherent feature of any 
polymorphic marker, as the largest percentage of a population that two polymorphic markers can define 
is 50% each. If a polymorphic marker is present at a level of less than 5 0%, then that marker is even 
more informative, Le. , a greater percentage of the population can be eliminated on the basis of the 
marker. Nevertheless, the ability to eUminate even 50% of the population from a pool of suspects 
clearly is a real world, practical utility. 

Applicants point out that naturally occurring genetic polymorphisms such as the polymorphism 
described in the specification as originally filed are both the basis of, and critical to, inter alia, forensic 
genetic analysis intended to resolve issues of, for example, identity or paternity. Forensic analysis based 
on polymorphisms such as the polymorphism identified by Applicants is used to rule out sxxspects in 
many criminal cases, and to rule out suspects in the identification of human remains. Paternity 
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determination is based on polymorphisms such as the polymorphism identified by i^plicants to rule out 
individuals suspected of fathering a particular child. What could be possibly be more substantial and 
real world than the loss of an individual's freedom or life through incarceration? What could be 
possibly be more substantial and real world than the identification ofhuman remains? What could be 
possibly be more substantial and real world than the impact, both economic and emotional, that the 
results of a paternity analysis has on the individuals directly and indirectly involved? These are all well 
known and generally accepted uses of polymorphisms such as the polymorphism identified by 
Applicants. Without such identified polymorphisms, the skilled artisan would not be able to carry out 
such forensic or paternal analyses. Therefore, as the use of the presently described polymorphic 
marker in forensic analysis is clearly a real world and substantial utility, the presently claimed sequences 
meet the requirements of 35 U.S.C. § 101 . 

The Examiner seems to imply that the present claims do not meet the requirements of 
35 U.S.C. § 101 because "extensive experimentation" (the Action at page 3), "significant fixrther 
research" (the Action at page 4), "(s)ignificant further experimentation" (the Action at page 6 (twice) 
and page 7) would be required in certain aspects of the invention. Applicants first point out that the use 
of the presently described polymorphic marker in forensic analysis, as detailed above, requires no 
fiirther research. Thus, the presently described polymorphism can be used to eliminate an individual 
from a pool of suspects in its currently available form. Second, Applicants respectfixUy point out that 
the proper standard for meeting the requirements of 35 U.S.C. § 101 is not whether "extensive 
experimentation", "significant further research", or "(s)ignificant further experimentation" is required to 
practice certain aspects of the claimed invention, but whether undue experimentation would be required 
to practice the claimed invention. The widespread use of polymorphisms such as that described by 
Applicants in forensic analysis every day strongly argues against such a use requiring 
"undue experimentation". Applicants point out that in assessing the question of whether undue 
experimentation would be required in order to practice the claimed invention, the key term is "undue", 
not "experimentation". In re Angstadt and Griffin, 190 USPQ 214 (CCPA 1976). However, even 
if, arguendo, fiirther research might be required in certain aspects of the present invention, this does 
not preclude a finding that the invention has utility. As clearly set forth by the Federal Circuit in 
In reBrana, (34 USPQ2d 1436 (Fed. Cir. 1 995); ''Brana'\ 'pharmaceutical inventions, necessarily 
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includes the expectation o f fiiither research and development (Brana at 1442-1443, emphasis added). 
Thus, the need for some experimentation clearly does not render the claimed invention unpatentable 
(see also In re Wands, supra). Indeed, a considerable amount of experimentation maybe permissible 
if such experimentation is routinely practiced in the art. In reAngstadt and Griffin, supra; Amgen, 
Inc, V. Chugai Pharmaceutical Co., Ltd., 18 USPQ2d 1016 (Fed. Cir. 1991). Thus, the present 
claims clearly meet the requirements of 35 U.S.C. § 101. 

AppUcants respectfully point out that as the presently described polymorphism is a part of the 
family of polymorphisms that have a well-established utility, the Federal Circuit's holding in 
Brana (^wpra) is directly on point In5ra«a,theFederal Circuit admonished the USPTO for confusing 
"the requirements under the law for obtaining a patent with the requirements for obtaining government 
approval to market aparticular drug for himian consumption". Brana 1442. The Federal Circuit 
went on to state: 

At issue in this case is an important question of the legal constraints on patent office 
examination practice and policy. The question is, with regard to pharmaceutical 
inventions, what must the applicant provide regarding the practical utility or usefulness 
of the invention for which patent protection is sought. This is not a new issue: it is one 
which we would have thoueht had been settled bv case law vears ago . 

Brana at 1439, emphasis added. The choice of the phrase "utility or usefuhiess" in the foregoing 

quotation is highly pertinent. The Federal Circuit is evidently using **utilit/* to refer to rejections under 

35 U.S.C. § 101, and is using "usefiihiess" to refer to rejections under 35 U.S.C. § 112, first 

paragr^h. This is made evident in the continuing text infira/ia, which explains the correlation between 

35 U.S.C. §§ 101 and 1 12, first paragraph. The Federal Circuit concluded: 

FDA approval, however, is not a prerequisite for finding a compound useful within the 
meaning of the patent laws. Usefulness in patent law, and in particular in the context 
of pharmaceutical inventions, necessarily includes the expectation of further research 
and development. The stage at which an invention in this field becomes useful is well 
before it is ready to be administered to humans. Were we to require Phase n testing 
in order to prove utility, the associated costs would prevent many companies from 
obtaining patent protection on promising new inventions, thereby eliminating an 
incentive to pursue, through research and development, potential cures in many cmcial 
areas such as the treatment of cancer. 

Brana at 1442-1443, citations omitted. Thus, based on the holding inBrana, the present claims meet 
the requirements under 35 U.S.C. § 101 and 35 U.S.C. § 112, first paragraph (see Section VI, 
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below). 

It is important to note that it has been clearly established that a statement of utility in a 

specification must be accepted absent reasons why one skilled in the art would have reason to doubt 

the objective truth of such statement. In reLanger, 503 F.2d 1380, 1391, 183 USPQ 288, 297 

(CCPA, 1974; ''Langer'y, In re MarzocchU 439 F.2d 220, 224, 169 USPQ 367, 370 (CCPA, 

1971). As clearly set forth in Langer: 

As a matter of Patent OfiBce practice, a specification which contains a disclosure of 
utility which corresponds in scope to the subject matter sought to be patented must be 
taken as sufficient to satisfy the utility requirement of § 1 0 1 for the entire claimed 
subject matter unless there is areason for one skilled in the art to question the objective 
tmth of the statement of utility or its scope. 

Langer at 297, emphasis in original. As set forth in the Manual of Patent Examining Procedure 
("MPEP"), "Office personnel must provide evidence suflBcient to show that the statement of asserted 
utility would be considered *false' by a person ofordinary skill in the art" (MPEP, Eighth Edition at 
2100-40, emphasis added). Therefore, absent evid^ice fix)m the Examinerthatthepiesently describe 
^ polymorphic marker could not be used in forensic analysis as detailed above, as the skilled artisan 
would readily understand that the present polymorphic marker has utility in forensic analysis, the present 
claims clearly meet the requirements of 35 U.S.C. § 101. 

Additionally, given the association between the presently claimed sequence and high blood 
pressure, arrhythmia, and diabetes, as detailed above, those ofskill in the art would readily appreciate 
the importance of tracking the expression of the genes encoding the described proteins, as described 
in the specification as originally filed, at least at page 6, lines 5-7. Li particular, the specification 
describes how the described sequences can be represented using a gene chip format to provide a high 
throughput analysis of the level of gene expression. Such ' T)NA chips" clearly have utility, as evidenced 
by hundreds of issued U.S. Patents, as exemplified by U.S. Patent Nos. 5,445,934, 5,556,752, 
5,744,305 (Exhibits M-O; submitted with the Information Disclosure Statement filed on 
March5,2002),andU.S.PatentNos. 5,837,832,6,156,501 and 6,261,776(ExhibitsP-R; copies 
of issued U.S . Patents not provided pursuant to requests fi-om the USPTO). As the present sequences 
are specific markers of the human genome (see below), and such specific markers are targets for the 
discovery of dmgs that are associated with human disease, those ofskill in the art would instantly 
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^ recognize that the present nucleotide sequences would be an ideal, novel candidate for assessing gene 
expression using such DNA chips. Given the widespread utility of such "gene chip" methods using 
public domain gene sequence information, there can be little doubt that the use of the presently 
described novel sequences would have great utility in such DNA chip appUcations. Clearly, 
compositions that enhance the utility of such DNA chips, such as the presently claimed nucleotide 
sequences, must in themselves be useful. 

Fxirther evidence of the "real world" substantial utility of the present invention is provided by 
the fact that there is an entire industry established based on the use of gene sequences or fragments 
thereofin a gene chip format. Perhaps the most notable gene chip company is Affymetrix. However, 
there are many companies which have, at one time or another, concentrated on the use of gene 
sequences or fragments, in gene chip and non-gene chip formats, for example: Gene Logic, ABI- 
Perkin-Elmer, HySeq and Incyte. In addition, one such company (Rosetta Inpharmatics) was viewed 
to have such "real world" value that it was acquired by large a pharmaceutical company (Merck) for 
significant sums of money (net equity value of the transaction was $620 million). The "real world" 
substantial industrial utility of gene sequences or fragments would, therefore, appear to be widespread 
and well established. Clearly, persons of skill in the art, as well as venture capitalists and investors, 
readily recognize the utility, both scientific and commercial, of genomic data in general, and specifically 
human genomic data. Billions of dollars have been invested in the human genome project, resulting in 
useful genomicdata(see,e^.. Venter e^a/.,5'c/e«ce 291:1304, 2001; Exhibits). The results have 
been a stunning success as the utility ofhuman genomic data has been widely recognized as a great gift 
to humanity (see, e.g., Jasny and Kennedy, Science 291:1 153, 2001; Exhibit T). Clearly, the 
usefulness ofhuman genomic data, such as the presently claimed nucleic acid molecules, is substantial 
and credible (worthy ofbillions of dollars and the creation of numerous companies focused on such 
information) and well-established (the utility ofhuman genomic information has been clearly understood 
for many years). Thus, the present claims clearly meet the requirements of 35 U.S.C. § 101. 

The Examiner alleges that this asserted utility is "not specific or substantial" because "any 
polynucleotide sequence" can be used in this manner (the Action at page 6). This argument is flawed 
in a number of respects. First, Applicants respectfully point out that the association between the 
presently claimed sequence and high blood pressure, arrhythmia, and diabetes, as detailed above, is 
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not true for "any polynucleotide sequence". Furtheraiore, expression profiling does not even require 

a knowledge of the function of the particular nucleic acid on the chip - rather the gene chip indicates 

which DNA fragments are expressed at greater or lesser levels in two or more particular tissue types. 

Skilled artisans already have used and continue to use sequences such as Applicants in gene chip 

applications without further experimentation. Second, Applicants respectfully point out that only 

expressed polynucleotide sequences can be used to track gene expression, not just "any polynucleotide 

sequence". Third, the Examiner appears to be confusing the requirement for a specific utility, which is 

the proper standard for utility under 35 U.S.C. § 101 , with arequirement for a unique utility, which is 

clearly an improper standard. As clearly set forth by the Federal Circuit in Carl Zeiss Stiftung v. 

RenishawPLC, 20 USPQ2d 1101 (Fed. Cir. 1991; "Carl Zeiss''): 

An invention need not be the best or only way to accomplish a certain result, and it 
need only be useful to some extent and in certain applications: "[T]he fact that an 
invention has only limited utility and is only operable in certain applications is not 
grounds for finding a lack of utility." Envirotech Corp, v. Al George, Inc. , 22 1 USPQ 
473, 480 (Fed. Cir. 1984) 

Following directly from the quote above, an invention does not need to be the only way to accomplish 
a certain result. Thus, the question of whether or not other nucleic acid sequences can be used to 
assess gene expression patterns is completely irrelevant to the present utiUty inquiry. The only relevant 
question in regard to meeting the standards of 35 U.S.C. § 101 is whether "any polynucleotide 
sequence" can be so used - and the clear answer to this question is an emphatic no. Importantly, the 
holding in Carl Zeiss is mandatory legal authority that essentially controls the outcome of the present 
case. This case, and particularly the cited quote, directlv rebuts the Examiner's argument. 
Furthermore, the requirement for a xmique utility is clearly not the standard adopted by the USPTO. 
If everyinvention were required to have a unique utility, the USPTO would no longer be issuingpatents 
on batteries, automobile tires, golfballs, golf clubs, and treatments for a variety ofhuman diseases, such 
as cancer and bacterial or viral infections, just to name a few particular examples, because examples 
of each of these have already been described and patented. All batteries have the exact same utility - 
specifically, to provide power. All automobile tires have the exact same utility - specifically, for use on 
automobiles. All golfballs and golf clubs have the exact same utility - specifically, use in the game of 
golf All cancer treatments have the exact same utility - specifically, to treat cancer. All anti-infectious 
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agents have the exact same broader utility - specifically, to treat infections. However, only the briefest 
perusal of virtually any issue of the Official Gazette provides numerous examples of patents being 
granted on each of the above compositions every week . Additionally, if a composition needed to be 
unique to be patented, the entire class and subclass system would be an effort in futility, as the class and 
subclass system serves solely to group such common inventions, which would not be required if each 
invention needed to have a unique utiUty. Thus, the present sequence clearly meets the requirements 
of35U.S.C. § 101. 

Applicants note that the Examiner correctly determines that the generic class with regard to 
the present invention is "any polynucleotide sequence", but then attempts to narrow the generic class 
of the invention to include only those nucleic acids that are expressed (and associated with high blood 
pressure, arrhythmia, and diabetes) in order to support an allegation that the claimed nucleic acids lack 
a "specific" utility. Applicants reiterate that not all nucleic acids are expressed - in fact, only 2-4% of 
all nucleotide sequences are expressed, and only a very small number of these are associated with high 
blood pressure, arrhj^hmia, and diabetes. Therefore, the question of whether the asserted utility is 
"specific", as opposed to "generic", has clearly been laid to rest. Applicants note that such redefinition 
of the generic class of the invention is completely improper, and in clear defiance of established case 
law. Therefore the present claims are clearly in compliance with 35 U.S.C. § 101. 

The Examiner fiirther discounts this assertion of utility because "the specification does not 
disclose specific cDNA ... targets" (the Action at page 6). This is simply not tme. The specification 
as originally filed at page 3, Unes 28-33, clearly states that the presently claimed sequence "is expressed 
in human fetal brain, brain, cerebellum, pituitary, prostate, thymus, lymph node, bone marrow, trachea, 
fetal Uver, liver, testis, thyroid, sahvary gland, stomach, skeletal muscle, heart, utems, adipose, 
hypothalamus, ovary, tongue, aorta, 12 week old embryo, adenocarcinoma, and osteosarcoma cells". 
Thus, the Examiner's argument in no way supports the allegation that the presently claimed sequences 
lack a patentable utility. 

As yet a further example of the utility of the presently claimed polynucleotides, as described in 
the specification atpage 3, lines 2-4, the present nucleotide sequences have a specific utilitv in "the 
identification of protein coding sequences" and **m^ping a unique gene to aparticular chromosome". 
The specification as origmally filed, at page 3, lines 5 and 6, details that the gene encoding the presently 
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claimed sequences is present on "human chromosome 2, see GENBANK accession no. AC0025750". 
In fact, alignment of SEQ JD NO: 1 with GenBank AccessionNumber AC025750 (which is a genomic 
clone from human chromosome 2) shows that the human gene corresponding to SEQ ID NO: 1 is 
dispersed on 2 exons ofhuman chromosome 2 (alignment and the first page from the GenBank report 
are presented in Exhibit U). Clearly, the present polynucleotide provides exquisite specificity in 
localizing the specific region ofhuman chromosome 2 that contains the gene encoding the given 
polynucleotide, a utility not shared by virtually anv other nucleic acid sequences. In fact, it is this 
specificity that makes this particular sequence so usefiil. Early gene mapping techniques relied on 
methods such as Giemsa staining to identify regions of chromosomes. However, such techniques 
produced genetic maps with a resolution of only 5 to 1 0 megabases, far too low to be of much help in 
identifying specific genes involved in disease. The skilled artisan readily appreciates the significant 
benefit afforded by markers that m^ a specific locus of the human genome, such as the present nucleic 
acid sequence. For further evidence in support of the Applicants' position, the Examiner is requested 
to review, for example, section 3 of Venter et al. {supra, at pp. 1317-1321, including Fig. 1 1 at 
pp. 1 324- 1 325 ; see Exhibit S), which demonstrates the significance of expressed sequence information 
in the stmctural analysis of genomic data. The presently claimed polynucleotide sequence defines a 
biologically validated sequence that provides a unique and specific resource for mapping the genome 
essentially as described in the Venter et al article. Thus, the present claims clearly meet the 
requirements of 35 U.S.C. § 101. 

Applicants reiterate that only a minor percentage (2-4%) of the genome actually encodes 
exons, which in-tum encode amino acid sequences. Significantly, the claimed polynucleotide sequence 
defines how the encoded exons are actually spliced together to produce an active transcript (/. e. , the 
described sequences are useful for functionally defining exon splice-junctions). As described in the 
specification as origmally filed at page 3, lines 6-9, the claimed "sequences identify actual, biologically 
relevant, exon splice junctions, as opposed to those that might have been predicted bioinformatically 
from genomic sequence alone". The specification as originally filed, at page 11, lines 13-18,further 
details that "sequences derived from regions adjacent to the intron/exon boundaries of the human gene 
can be used to design primers for use in amplification assays to detect mutations within the exons, 
introns, sphce sites {e,g. , splice acceptor and/or donor sites), etc., that can be used in diagnostics and 



16 



pharmacogenomics". Applicants respectfully submit that the practical scientific value o fbioloeicallv 
validated, expressed, spliced, andpolyadenylatedmRNA sequences is readily apparent to those skilled 
in the relevant biological and biochemical arts. Thus, the present sequence clearly meets the 
requirements of 35 U.S.C. § 101. 

Once again, the Examiner alleges that this asserted utility is "not specific or substantial" 
because "(s)uch assays can be performed with any polynucleotide" (the Action at page 7). With 
respect to the presently asserted utility, ttiis argument is once again flawed in a number of respects. 
First, Applicants once again point out that only expressed sequences can be used in the identification 
of coding sequence, not just "any polynucleotide". Second, Applicants reiterate that the requirements 
of a specific utility, which is the proper standard for utility under 35 U.S.C. § 101, should not be 
confiised with the requirement for a unique utility, which is clearly an improper standard {Carl Zeiss, 
supra). The fact that a small number of other nucleotide sequences could be used to map the protein 
coding regions in this specific region of chromosome 2 does not mean that the use of Applicants' 
sequence to map the protein coding re^ons of chromosome 2 is not a specific utility. Once again, the 
question of whether or not other nucleic acid sequences can be so used is completely irrelevant to the 
present utility inquiry. Therailyrelevantquestioninregaidtomeetingthestandardsof35 U.S.C. § 101 
is whether "any polynucleotide" can be so used - and the clear answer to this question is once again 
an emphatic no. y^plicants respectfiilly point out the Examiner is once again attempting to narrow the 
generic class of "any polynucleotide" to include Mily the small number of nucleic acidmolecules that 
are expressed fix>m this particular region of chromosome 2 in order to support the allegation that the 
claimed nucleic acids lack a "specific" utility. Applicants respectfiilly point out once again that this is 
improper under the law as well as the policy of the USPTO. Thus, the present claims clearly meet 
the requirements of 35 U.S.C. § 101. 

Rather, as set forthby the Federal Circuit, "(t)he threshold of utiUty is not hi^: An invention is 
'usefiil' undersection 101 ifitiscapableofprovidingsomeidentifiablebenefit." Juicy Whiplnc. v. 
Orange Bang Inc., 51 USPQ2d 1700 (Fed. Cir. 1999) (citing Brenner v. Manson, 383 U.S. 519, 
534(1 966)). Additionally, the Federal Circuit has stated that "(t)o violate § 1 0 1 the claimed device 
must be totally incapable of achieving a usefiil result." Brooktree Corp. v. Advanced Micro Devices, 
Inc., 977 F.2d 1555, 1571 (Fed. Cir. 1992), emphasis added. Cross v. lizuka (224 USPQ 739 
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(Fed. Cir. 1985); ''Cross'') states " any utility of the claimed compounds is sufficient to satisfy 
35 U.S.C. § 101". Cross at 748, emphasis added. Indeed, as discussed in Section H, above, the 
Federal Circuit recently emphatically confirmed that "anything under the sun that is made by man" is 
patentable {State Street Bank & Trust Co. v. Signature Financial Group Inc,^ supra^ citing the 
U.S. Supreme Court's decision in Diamond vs. Chakrabarty, supra). 

Finally, the requirements set forth in the Action for compliance with 35 U.S.C. § 101 do not 
complywith the requirements set forth by the USPTO itself for compliance with 3 5 U.S.C. § 101. 
While Applicants are well aware of the new UtiUty Guidelines set forth by the USPTO, AppUcants 
respectfully point out that the current rules and regulations regarding the examination of patent 
applications is and always has been the patent laws as set forth in 3 5 U.S .C. and the patent rules as set 
forth in 37 C.F.R., not the Manual ofPatent Examination Procedure or particular guidelines for patent 
examination set forth by the USPTO. Furthermore, it is the job of the judiciary, not the USPTO, to 
interpret these laws and rules. Applicants are unaware of any significant recent changes in either 
35U.S.C. § 101, or in the interpretation of35U,S.C. § 101 by the Supreme Court or the Federal 
Circuit that is in keeping with the new Utility Guidelines set forth by the USPTO. This is underscored 
by numerous patents that have been issued over the years that claim nucleic acid fi-agments that do not 
comply with the new UtiUty Guidehnes. As just a few examples of such issued U.S. Patents, the 
Examiner is invited to review U.S. PatentNos. 5,817,479, 5,654,173, and 5,552,281 (eachofwhich 
claims short polynucleotides; Exhibits V-X; copies of issued U.S. Patents not provided pursuant to 
requests fi-om the USPTO), and U.S. Patent No. 6,340,583 (which includes no working examples; 
Exhibit Y; copies of issued U.S . Patents not provided pursuant to requests firom the USPTO), none 
of which contain examples of the "real-world" utilities that the Examiner seems to be requiring. As 
issued U.S. Patents are presumed to meet all of the requirements for patentabiUty, including 
35U.S.C. §§ 101 and 112, firstparagraph(see Section VI, below). Applicants subniitthatthepresent 
polynucleotides must also meet the requirements of 35 U.S.C. §101. While Applicants understand that 
each application is examined on its own merits. Applicants are unaware of any changes to 
35U.S.C. § 101, orinthe interpretation of 35 U.S.C. § 101 by the Supreme Court or the Federal 
Circuit, since the issuance of these patents that render the subject matter claimed in these patents, which 
is similar to the subject matter in question in the present ^plication, as suddenly non-statutory or failing 
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to meetthe requirements of 35 U.S.C. § 101. Thus, holding Applicants to a diflFerent standard of utility 
would be arbitrary and capricious, and, like other clear violations of due process, cannot stand. 

For each of the foregoing reasons. Applicants submit that as the presently claimed nucleic acid 
molecules have been shown to have a substantial, specific, credible and well-established utility, the 
rejection of claims 1, 3, and 5-1 1 under 35 U.S.C, § 101 has been overcome, and request that the 
rejection be withdrawn. 

VI. Rejection of Claims 1. 3. and 5-11 Under 35 U.S.C. S 112. First Paragraph 

The Action next rejects claims l,3,and5-llunder35U.S.C. § 112, first paragraph, since 
allegedly one skilled in the art would not know how to use the invention, as the invention allegedly is 
not supported by a specific, substantial, and credible utility or a well-established utility. Applicants 
respectfully traverse. 

First, while AppUcants in no way agree with the Examiner' s position that one skilled in the art 
would not know how to use the invention as set forth in claims 1 , 7 and 1 0, since claims 1 , 7 and 1 0 
have been cancelled entirely without prejudice and without disclaimer, the present rejection of claims 1 , 
7and 10 under 35 U.S.C. § 11 2, first paragraph is rendered moot. The remainder ofthis section will 
therefore focus on claims 3, 5, 6, 8, 9 and 11. 

Applicants submit that as claims 3, 5, 6, 8, 9 and 1 1 have been shown to have "a specific, 
substantial, and credible utility", as detailed in section V above, the present rejection of claims 3, 5, 6, 
8, 9 and 1 1 under 35 U.S.C. § 1 12, first paragraph, cannot stand. 

Applicants therefore request that the rejection of claims 1,3, and 5-11 under 35 U.S.C. § 112, 
first paragraph, be withdrawn. 

VII. Rejection of Claims 1, 6, and 9 Under 35 U.S.C, § 112, First Paragraph 

The Action next rejects claims 1,6, and9under35 U.S.C. § 1 12, first paragraph, as allegedly 
not providing enablement for the fiiU scope of the claimed invention. While AppUcants in no way agree 
with the Examiner's position that claims 1 , 6, and 9 are not enabled for the fall scope of the claim, as 
claim 1 has been cancelled entirely without prejudice and without disclaimer, and claim 6 has been 
amended to reference claim 3, which is not subject to the present rejection, the present rejection of 
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claims 1, 6, and 9 under 35 U.S.C. § 1 12, first paragraph, has been overcome. 

AppHcants therefore respectfully request that the rejection of claims 1, 6, and 9 imder 
35 U.S.C. § 1 12, first paragraph, be withdrawn. 

VIII. Rejection of Claims 9-11 Under 35 U.S.C. S 112. First Paragraph 

TheActionnextrejects claims 9-1 1 imder 35 U.S.C. § 1 12, firstparagraph, as allegedly not 
enabled for the fiiU scope of the claim. Applicants respectfully traverse. 

First, while AppUcants in no way agree with the Examiner's position that the invention as set 
forth in claim 1 0 is not fully enabled, since claim 1 0 has been cancelled entirely without prejudice and 
without disclaimer, the present rejection of claim 10 under 35 U.S.C. § 112, first paragraph is rendered 
moot. The remainder of this section will therefore focus on claims 9 and 1 1 . 

The Examiner states that "(t)he specification of the instant application teaches that NHP gene 
products (SEQ ID NO : 1 ) can be expressed in transgenic animals and any technique known in the art 
maybe used to introduce a NHP transgene into animals to produce the founder lines of transgenic 
animals", but that claims 9 and 1 1 are not enabled for non-human transgenic animals because "there 
are no methods or working examples disclosed in the instant ^pUcation whereby a multicellular animal 
with the incorporated NHP gene of SEQ ID NO: 1 is demonstrated to express the NHP peptide", and 
"(t)he unpredictability of the art is very high with regards to making transgenic animals" (the Action at 
page 1 0, emphasis in original). With regard to the Examiner's first argument. Applicants respectfiiUy 
point out that this argument is not dispositive as to the question of enablement, for it has long been 
established that "there is no statutory requirement for the disclosure of a specific example" (In re Gay, 
309 F.2d 769, 135 USPQ 311 (CCPA, 1962)). Thus, this argument alone cannot support an 
allegation that claims 9 and 1 1 are not enabled. 

With regard to the Examiner' s second argument, concerning the unpredictabiUty in the art with 
regard to making transgenic animals, the Examiner cites four scientific articles that allegedly support this 
position. Specifically, the Examiner cites Wang et al (Nuc. Acids Res, 27:4609-46 18,1 999) and 
Kaufinan et al. (Blood 94:3 1 78-3 1 84, 1 999) to support the argument that expression levels of an 
inserted transgene are highly variable, Wigleye^ a/. (Reprod, Pert. Dev. 6:585-588, 1994) to support 
the argument that production of non-human transgenic animals bypronuclear microinjection (one of the 
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methods of producing non-human transgenic animals specifically cited in the specification as originally 
filed) suffers fix>m limitations such as low firequency of integration events and random integration, and 
Campbell et al {Theriology 47:63-72, 1 997) to support the argument that production of non-human 
transgenic animals by from ES cells (another method of producing non-human transgenic animals 
specifically cited in the specification as originally filed) has been difficult. Thus, the Examiner concludes 
that "it would have required undue experimentation for the skilled artisan to have made any and all 
transgenic non-human aiumals according to the instant invention" (the Action at page 1 1). 

Rather than list the numerous deficiencies of each of the articles cited by the Examiner, 
Applicants instead will present evidence of the state of the art with regard to making transgenic animals 
as of the filing date of the present application (December 1 0, 200 1 ). The specification as originally 
filed, at page 1 7, lines 15-18, details that "(a)nimals of any species, including, but not limited to, worms, 
mice, rats, rabbits, guinea pigs, pigs, micro-pigs, birds, goats, and non-human primates, e.g. , baboons, 
monkeys, and chimpanzees maybe used to generate NHP transgenic animals". Applicants respectfiiUy 
point out that there are numerous examples of transgenic worms (nematodes), mice, rats, rabbits, 
guinea pigs, pigs, birds (chickens), goats and monkeys, years and sometimes decades prior to the filing 
date of the present application. However, rather than provide hundreds of citations of transgenic 
animals that are in the art prior to the filing date of the present s^lication, Applicants respectfiilly point 
out that the first report of a transgenic nematode was in 1 988 (Spieth et al , Dev, Biol 130:285-293; 
copy of abstract provided in Exhibit Z), the first report of a transgenic mouse was in 1980 
(Gordon et al,, Proc. Natl. Acad. Set USA 77:7380-7384; copy of manuscript provided in 
Exhibit AA), the first report of a transgenic rat was in 1 990 (MuUins et al. , Nature 344: 54 1 -544; 
copy of abstract provided in Exhibit BB), the first report of a transgenic rabbit was in 1 985 (Hammer 
et al , Nature 31 5:680-683; copy of abstract provided in Exhibit CC), a report of the production of 
human interleukin-2 in the milk of transgenic rabbits was published in 1990 (Buhler et al, 
Bio/T echnology 8 : 1 40- 1 43 ; copy of abstract provided in Exhibit DD), the first reports of transgenic 
guinea pigs were in 2000 (Suzuki et al. , Gene Ther. 7: 1 046- 1 054, and Yagi et al. , JARO 1:31 5-325 ; 
copies of abstracts provided in Exhibit EE), a report of the production of hmnan growth hormone in 
the milk of transgenic guinea pigs was also published in 2000 (Hens et al. , Biochim. Biophys. Acta 
1 523 : 1 6 1 - 1 7 1 ; copy of abstract provided in Exhibit FF), the first report of a transgenic pig was in 
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1985 (see Exhibit CC), a report of the production of a heterologous milk protein in the milk of 
transgenic pigs was published in 1991 (y^dX\etal,Proc. Natl Acad. Set. USA 88:1696-1700; copy 
of manuscript provided in Exhibit GG), the first reports of transgenic chickens were in 1 987 (Saher 
etaL, Virology 157:236-240; copy of abstract provided in Exhibit HH) and 1989 (Bossehnane^ a/., 
J. Virol. 63 :2680-2689; copy of abstract provided in Exhibit II), the first reports of transgenic goats 
were in 1991 (Ebert et aL, Bio/Technology 9:835-838, and Denman et al, Bio/Technology 
9: 839-843 ; copies of abstracts provided in Exhibit JJ), and the first report of a transgenic monkey 
(rhesus monkey) was in January of 2001 (Chan et al. , Science 291 :309-3 12; copy of manuscript 
provided in Exhibit KK). Additionally, the first report of a transgenic cow (raised by the Examiner 
on page 1 1 of the Action) was in 1991 (Krimpenfort et al,^ Bio/Technology 9:844-847; copy of 
abstract provided in Exhibit LL), the first report of a transgenic sheep (another example of a transgenic 
mammal) was in 1988 (Simons et al., Bio/Technology 6:179-183; copy of abstract provided in 
Exhibit MM), and a report of the production of human anti-hemophilic factor IX in the milk of 
transgenic sheep was published in 1 989 (Clark et al. , Bio/Technology 7:487-492; copy of abstract 
provided in Exhibit NN). Given the hundreds of reports of transgenic animals, of which the reports 
listed above are only the first examples, there can be no doubt that the making of transgenic animals 
is clearly enabled to those of skill in the art, which is all that is required to meet the enablement 
requirement under 35 U.S.C. § 1 12, first paragraph. 

The Examiner seems to believe that claims 9 and 1 1 are not enabled for transgenic animals 
because certain aspects of transgenic technology (expression levels, site-specific versus random 
integration) require some level of experimentation to perfect. However, Applicants respectfully point 
out that all that is required in order to satisfy the enablement requirement under 35 U.S.C. § 1 12, first 
paragrq>h, is making any transgenic animal, not the perfect transgenic animal. Any detectable level 
of expression of a transgene, for example SEQ ID NO: 1 , is all that is required, for it is well estabUshed 
tiiat the enablement requirement is met if any use of the invention (or in this case, certain aspects of the 
invention) is provided (In re Nelson, 1 26 USPQ 242 (CCPA 1 960); Cross v. lizuka, supra). "The 
enablement requirement is met if the description enables any mode of making and using the invention." 
Johns Hopkins Univ, v. CellPro, Inc., 47 USPQ2d 1705, 1719 (Fed. Cir. 1998), citing Engel 
Indus., Inc. v. Lockformer Co., 20 USPQ2d 1300, 1304 (Fed. Cir. 1991). Furthermore, a 
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specification '"need describe the invention only in such detail as to enable a person skilled in the most 
relevant art to make and use it." In re Naquin, 158 USPQ 3 1 7, 3 1 9 (CCPA 1 968); emphasis added. 
Therefore, as the skilled artisan is clearly able to make a variety of different species of transgenic 
animals, claims 9 and 1 1 are thus enabled as they are supported by a specification that provides 
sufficient description to enable the skilled person to make and use the invention as claimed. 

The Examiner states that the present invention could not be practiced without **imdue 
experimentation". However, it is important to remember that in assessing the question of whether undue 
experimentation would be required in order to practice the claimed invention, the key term is **undue", 
not "experimentation". In re Angstadt and Grijfin, supra. The large number of reports in the 
literature on a variety of transgenic animals strongly argues against such a use requiring "undue 
experimentation". In In re Wands (supra), the USPTO took the position that the applicant failed to 
demonstrate that the disclosed biological processes of immimization and antibody selection could 
reproducibly result in a useful biological product (antibodies firom hybridomas) within the scope of the 
claims. In its decision overtuming the USPTO' s rejection, the Federal Circuit found that Wands' 
demonstration of success in four out of nine cell lines screened was sufficient to support a conclusion 
of enablement. The court emphasized that the need for some experimentation requiring, e.g. , production 
of the biological material followed by routine screening, was not a basis for a finding of non-enablemmt, 
stating: 

Disclosure in application for the immunoassay method patent does not fail to meet 
enablement requirement of 35 USC 112 by requiring 'undue experimentation' , even 
though production of monoclonal antibodies necessary to practice invention first 
requires production and screening of numerous antibody producing cells or 
'hybridomas ' , since practitioners of art are prepared to screen negative hybridomas 
in order to find those that produce desired antibodies, since in monoclonal antibody art 
one 'experiment' is not simply screening of one hybridoma but rather is entire attempt 
to make desired antibody, and since record indicates that amount of effort needed to 
obtain desired antibodies is not excessive, in view of Applicants' success in each 
attempt to produce antibody that satisfied all claim limitations. 

Wands at 1400. Thus, the need for some experimentation does not render the claimed invention 
unpatentable imder 35 U.S.C. § 112, first paragraph. Indeed, a considerable amount of 
experimentation may be permissible if such experimentation is routinely practiced in the art. In re 
Angstadt and Griffin, supra; Amgen, Inc. v. Chugai Pharmaceutical Co., Ltd., supra). Therefore, 
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given the evidence detailed above concerning the abiUty of the skilled artisan to produce transgenic 
animals with some detectable level of transgene expression with reasonable certainty, claims 9 and 1 1 
meet the enablement requirement. 

The Examiner next states that claims 9 and 1 1 are not enabled because "(Ohe specification also 
discloses that ^nucleotide constmcts encoding such NHP products can be used to genetically engineer 
host cells to express such products in vivo' and that these products can be used in gene therapy" (the 
Action at page 12), but "the specification does not teach any methods or working examples that 
indicate a NHP nucleic acid is introduced and expressed in a cell for therapeutic purposes", and the 
"(r)elevant literature teaches that since 1 990, about 3 500 patients have been treated via gene therapy 
and although some evidence of gene transfer has been seen, it has generally been inadequate for a 
meaningful clinical response" (the Action at page 12), citing ajoumal article by Phillips (J. Pharm. 
Pharmacology 53: 1 169-1 174, 2001). Once again, with regard to the Examiner's first argument. 
Applicants respectfiiUy point out that this argument is not dispositive as to the question of enablement, 
for it has long been established that "there is no statutory requirement for the disclosure of a specific 
example" (In re Gay^ supra). Thus, this argument alone cannot support an allegation that claims 9 and 
1 1 are not enabled. 

With regard to the Examiner's second argument, it once again appears that the Examiner seems 
to believe that claims 9 and 1 1 are not enabled for gene therapy because gene therapy is not always 
effective. However, Applicants once again point out that it is well established that the enablement 
requirement is met if any use of the invention is provided {In re Nelson^ supra\ Cross v. lizuka, 
supra). "The enablement requirement is met if the description enables any mode of making and using 
the invention." Johns Hopkins Univ. v. CellPro, Inc., supra). Furthermore, a specification "need 
describe the invention onlv in such detail as to enable a person skilled in the most relevant art to make 
and use it." In re Naquin, supra; emphasis added. Applicants respectfully point out that there are a 
number of reports in the literature, prior to the filing date of the present appUcation, concerning a variety 
of gene therapy vectors and successful gene therapy regimens. In fact, the article by Phillips cited by 
the Examiner herself admits that gene therapy can and has been practiced by the skilled artisan ("since 
1 990, about 3500 patients have been treated via gene therapy" and "some evidence of gene transfer 
has been seen"). Therefore, claims 9 and 11 are clearly enabled as they are supported by a 



24 



specification that provides sufficient description to enable the skilled person to make and use the 
invention as claimed. 

Furthermore, with regard to a requirement that the host cells of claims 9 and 1 1 be nearly 
always effective in gene therapy, such an enablement standard conflicts with established patent law. 
As discussed in In re Brana (^'Brana''\ supra\ the Federal Circuit admonished the USPTO for 
confusing * the requirements under the law for obtaining a patent with the requirements for obtaining 
government approval to maiket a particular drug for human consumption". Thus, based on the holding 
in Brana, claims 9 and 1 1 clearly meet the enablement requirement under 35 U.S.C. § 1 12, first 
paragraph. 

The Examiner then once again concludes that "undue experimentation would be required of the 
skilled artisan to introduce and express aNHP nucleic acid into the cells of an organism" (the Action 
at page 12). However, Applicants reiterate that in assessing the question of whether undue 
experimentation would be required in order to practice the claimed invention, flie key term is ^Hmdue", 
not "experimentation" {In reAngstadt and Griffin, supra). Once again, the large number of reports 
in the literature on a variety of gene therapy vectors, and advances in gene ther^y techniques, stronglv 
argues against such ause requiring "undue experimentation". However, even if, arguendo^ further 
experimentation might be required in certain aspects of the present invention, this does not preclude a 
finding that the invention is enabled, as set forth by the Federal Circuit's holding inBrana, which clearly 
states, as highlighted in the quote above, that "pharmaceutical inventions, necessarily includes the 
expectation of further research and development " (Brana at 1442-1443, emphasis added). 
Furthermore, the need for some experimentation does not render the claimed invention unpatentable 
under 35 U.S.C. § 1 12, first paragraph (In re Wands, supra). Indeed, a considerable amoimt of 
experimentation may be permissible if such experimentation is routinely practiced in the art. In re 
Angstadt and Griffin, supra; Amgen, Inc. v. Chugai Pharmaceutical Co., Ltd. , supra). Therefore, 
given the evidence detailed above concerning the ability of the skilled artisan to create gene therapy 
constructs that have some level of success, claims 9 and 1 1 meet the enablement requirement. 

Therefore, based on the evidence of record that it is well-known to skilled artisan how to make 
and use a variety of species of transgenic animals, as well as a variety of gene therapy vectors, the 
35 U.S.C. § 1 12, first paragraph, rejection is improper: 
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As amatter of patent office practice, then, a specification disclosure which contains a 
teaching of the manner and process of making and using the invention in terms which 
correspond in scope to those used in describing and defining the subject matter sought 
to be patented must be taken as in compliance with the enabling requirement of the first 
paragraph of § 11 2 unless there is reason to doubt the objective truth of the statements 
contained therein which must be relied on for enabling support. 

In reMarzocchU 169USPQ 367, 369 (CCPA 1971), emphasis as in original. AppUcantsrespectfiiUy 
point out that, as a matter of law, it is well settled that a patent need not disclose what is well-known 
in the art. In re Wands, supra. In fact, it is preferable that what is well-known in the art be omitted 
firom the disclosure. Hybritech, Inc. v. Monoclonal Antibodies, Inc., 231 USPQ 81 (Fed. Cir. 
1986). Therefore, the full breadth of claims 9 and 1 1 are clearly enabled. 

Applicants therefore request that the rejection of claims 9-11 under 35 U.S.C. § 1 12, first 
paragraph, be withdrawn. 

IX. Rejection of Claims 1. 6, and 9 Under 35 U.S.C. § 112, First Paragraph 

The Actionnext rejects claims 1, 6, and 9 under 35 U.S.C. § 1 12, first paragraph, as allegedly 
containing subj ect matter that was not described in the specification in such a way as to reasonably 
convey to one skilled in the relevant art that the inventors, at the time the application was filed, had 
possession of the claimed invention. While Applicants in no way agree with the Examiner's position 
that claims 1 , 6, and 9 do not meet the requirements of 35 U.S.C. § 1 12, first paragraph, as claim 1 
has been cancelled entirely without prejudice and without disclaimer, and claim 6 has been amended 
to reference claim 3, which is not subj ect to the present rejection, the present rej ection of claims 1 , 6, 
and 9 under 35 U.S.C. § 1 12, first paragraph, has been overcome. 

Applicants therefore respectfully request that the rejection of claims 1, 6, and 9 under 
35 U.S.C. § 1 12, first paragraph, be withdrawn. 

X. Rejection of Claims 1, 6, and 9 Under 35 U.S.C. § 102fe^ 

The Action next rejects claims 1 , 6, and 9 under 35 U.S.C. § 1 02(e), as allegedly anticipated 
by Tang era/. (US 20040014945A1; "Tang"). While Applicants do not necessarily agree with the 
Examiner' s position that claims 1,6, and 9 are anticipated by Tang, as claim 1 has been cancelled 
entirely without prejudice and without disclaimer, and claim 6 has been amended to reference claim 3, 
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which is not subject to the present rejection, the present rejection of claims 1, 6, and 9 xrnder 
35 U.S.C. § 102(e) has been overcome. 

Applicants therefore respectfully request that the rejection of claims 1, 6, and 9 under 
35 U.S.C. § 102(e) be withdrawn. 

XI. Conclusion 

The present document is a full and complete response to the Action. In conclusion, AppUcants 
submit that, in light of the foregoing remarks, the present case is in condition for allowance, and such 
favorable action is respectfully requested. Should Examiner Burner have any questions or comments, 
or believe that certain amendments of the claims might serve to improve their clarity, a telephone call 
to the undersigned Applicants* representative is earnestly solicited. 



Respectfully submitted, 
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(54) Bezeichnung: NEUER SPANNUNGSABHANGIGER KALIUMKANAL UND SEINE VERWENDUNG ZUR ENTWICKLUNG 
VON THERAPEUTIKA 



SONDE A 
Probo A 



ZOO bp 
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( ^ - ^ _ 

hKv6.2g«nom. .1 — 

hKv6 .2 genomic GCC GAG GAO OAO e«Q CTQAGCQ . . . . C ) CCCCQTCTGCCCreyrCCCCCQCAG QQC GAG TGC TCC 

Ala Clu Glu Glu Arfl Cly Olu Cy« Sec 

bKv6.2 cDNA GCA GAA gag OAA CGO OQC OAO TOO TCC 

hKv6.2 CDNA GCC GAG GAO GAG CGO GGC GAO TOC TCC 



(57) Abstract 

The invention relates to a novel tension-Kiependent potassium channel protein Kv6.2 (SEQ ID NO: 1 ). The Kv6.2 gene is expressed 
preferably in the myocardium or in the hippocampus. Novel functional heteromuUimeric potassium channels having high affinity with 
propafenone are formed in conjunction widi subunit Kv2.1. According to the invention, said novel potassium channels are used in test 
systems which are suitable for identifying substances modulating, opening or closing the Kv2.1/Kv6.2 channels and which can be used as 
therapeutic agents. 



EXHIBIT B 



>AF454547 ACCESSION: AF454547 NID: gi 22164081 gb AF454547.1 Homo 
sapiens voltage-gated potassium channel subunit KvlO.la 
mRNA, complete cds, alternatively spliced 
Length = 3670 

Score = 867 bits (2215), Expect = 0.0 

Identities = 425/425 (100%), Positives = 425/425 (100%) 
Frame = +1 

Query: 1 MTFGRSGAASWLNVGGARYSLSRELLKDFPLRRVSRLHGCRSERDVLEVCDDYDRERNE 60 

MTFGRSGAASWLNVGGARYSLSRELLKDFPLRRVSRIiHGCRSERDVLEVCDDYDRERNE 
Sbjct: 478 MTFGRSGAASWIiNVGGARYSLSRELLKDFPLRRVSRLHGCRSERDVLEVCDDYDRERNE 657 

Query: 61 YFFDRHSEAFGFILLYVRGHGKLRFAPRMCELSFYNEMIYWGLEGAHLEYCCQRFtLDDRM 120 

YFFDRHSEAFGFILLYVRGHGKLRFAPRMCELSFYNEMIYWGLEGAHIiEYCCQRRLDDRM 
Sbjct: 658 YFFDRHSEAFGFILLYVRGHGKLRFAPRMCELSFYNEMIYWGLEGAHIiEYCCQRRLDDRM 837 

Query: 121 SDTYTFYSADEPGVLGRDEARPGGAEAAPSRRWLERMRRTFEEPTSSLAAQILASVSWF 180 

SDTYTFYSADEPGVLGRDEARPGGAEAAPSRRWLERMRRTFEEPTSSLAAQIIiASVSWF 
Sbjct: 838 SDTYTFYSADEPGVLGRDEARPGGAEAAPSRRWLERMRRTFEEPTSSLAAQILASVSWF 1017 

Query: 181 VIVSMWLCASTLPDWRNAAADNRSLDDRSRIIEAICIGWFTAECIVRFIVSKNKCEFVK 240 

VIVSMWLCASTLPDWRNAAADNRSLDDRSRIIEAICIGWFTAECIVRFIVSKNKCEFVK 
Sbjct : 1018VIVSMVVIiCASTLPDWRNAAADNRSLDDRSRIIEAICIGWFTAECIVRFIVSKNKCEFVK 1197 

Query: 241 RPLNIIDLLAITPYYISVLMTVFTGENSQLQRAGVTLRVLRMMRIFWVIKLARHFIGLQT 300 

RPLNIIDLLAITPYYISVLMTVFTGENSQLQRAGVTLRVLRMMRIFWVIKLARHFIGLQT 
Sbjct : 1198RPLNIIDLLAITPYYISVLMTVFTGENSQLQRAGVTLRVLRMMRIFWVIKLARHFIGLQT 1377 

Query: 301 LGLTLKRCYREMVMLLVFICVAMAIFSALSQLLEHGLDLETSNKDFTSIPAACWWIISM 360 

LGLTLKRC YREMVMLLVF I C VAMAI F S ALSQLLEHGLDLETSNKDFTS I PAACWWVI I SM 
Sbjct : 1378LGLTLKRCYREMVMLLVFICVAMAIFSALSQLLEHGIiDLETSNKDFTSIPAACWWVIISM 1557 

Query: 3 61 TTVGYGDMYPITVPGRIL.GGVCWSGIVLLALPITFIYHSFVQCYHELKFRSARYSRSLS 42 0 

TTVGYGDMYPITVPGRILGGVCWSGIVLLALPITFIYHSFVQCYHELKFRSARYSRSLS 
Sbjct : 1558TTVGYGDMYPITVPGRILGGVCWSGIVLLALPITFIYHSFVQCYHELiKFRSARYSRSIiS 1737 

Query: 421 TEFLN 42 5 
TEFLN 

Sbjct: 1738TEFLN 1752 
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Links 



LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 
PUBMED 

REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
REFERENCE 
AUTHORS 
' TITLE 
JOURNAL 



FEATURES 

source 



CDS 



AF454547 3670 bp mRNA linear PRI 09--JUL-2004 

Homo sapiens voltage-gated potassium channel subunit KvlO.la mRNA, 
complete cds, alternatively spliced. 
AF454547 

AF454547 .1 GI:22164081 

Homo sapiens (human) 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 3670) 
Vega-Saenz de Miera,E.C. 

Modification of Kv2 . 1 K+ currents by the silent KvlO subunits 
Brain Res, Mol . Brain Res. 123 (1-2), 91-103 (2004) 

15046870 

2 (bases 1 to 3670) 

Vega-Saenz de Miera,E.C. and Rudy,B. 

KvlO.la and KvlO. lb: Two novel alternatively spliced potassium 

channel subunits 

Unpublished 

3 (bases 1 to 3670) 

Vega-Saenz de Miera,E.C. and Rudy,B. 
Direct Submission 

Submitted ( 04-DEC-2001 ) Physiology and Neuroscience, New York 
University School of Medicine, 550 First Avenue, New York, NY 
10016, USA 

Location/Qualifiers 
1. .3670 

/organism="Homo sapiens" 
/ mo l_type = " mRNA " 
/db_xref =" taxon: 9606" 
/chromosome= " 2 " 
/map="2p22-p21" 
478.. 1755 

/note="alternatively spliced" 
/ codon_s t ar t = 1 

/product= "voltage-gated potassium channel subunit KvlO.la" 
/protein_id= " AAM93548 . 1 " 
/db_xref ="GI : 22164082" 

/translation="MTFGRSGAASVVLNVGGARYSLSRELLKDFPLRRVSRLHGCRSE 
RDVLEVCDDYDRERNEYFFDRHSEAFGFILLYVRGHGKLRFAPRMCELSFYNEMIYWG 
LEGAHLEYCCQRRLDDRMSDTYTFYSADEPGVLGRDEARPGGAEAAPSRRWLERMRRT 
FEEPTSSLAAQILASVSWFVIVSMWLCASTLPDWRNAAADNRSLDDRSRIIEAICI 
GWFTAEC IVRFI VSKNKCEFVKRPLNI IDLLAITPYYI S VLMTVFTGENSQLQRAGVT 
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^ ' LRVLRMMRI FWVI KLARHFIGLQTLGLTLKRC YREMVMLLVF IC VAMAI FSALSQLLE 

HGLDLETSNKDFTSIPAACWWVIISMTTVGYGDMYPITVPGRILGGVCWSGIVLLAL 
PITFIYHSFVQCYHELKFRSARYSRSLSTEFLN" 
polvA_signal 3650,. 3655 
polvA_site 3670 
ORIGIN 

1 ggcctctcgc ctctggacgg cggcggggcg gccgccggat tcgcggccgc agggagcgcc 
61 ggagacgggg agctattccg ccccggcggc tccattcggc gcccgcagcc ctcagggggt 
121 cggccccgcg gcttgggaga gggcaccgcg gcctcggtgt gcgcagccct cgggcgcgag 
181 ggtcggcggc gcggacacag ccgcgttccc agccggtggg gctcagcgct ggcgccggca 
241 aggactcccc ggccacccgc aggtaccgcc gggcggaggg cgcgctacta gcagcgccgg 
301 agatactcga gcccagggac ccccgggcca gcggagggca ggagcggagc cccgagggag 
361 cgcgggcccc gacggcgcgc tcccccgtca gccacgggca ggcaggcccc gcgtggcggc 
421 ttggggtggg gggctgcagc ggggccctcg ggccgaaagt cccccgggcg gccagccatg 
481 accttcgggc gcagcggggc ggcctcggtg gtgctgaacg tgggcggcgc ccggtattcg 
541 ctgtcccggg agctgctgaa ggacttcccg ctgcgccgcg tgagccggct gcacggctgc 
601 cgctccgagc gcgacgtgct cgaggtgtgc gacgactacg accgcgagcg caacgagtac 
661 ttcttcgacc ggcactcgga ggccttcggc ttcatcctgc tctacgtgcg cggccacggc 
721 aagctgcgct tcgcgccgcg gatgtgcgag ctctccttct acaacgagat gatctactgg 
781 ggcctggagg gcgcgcacct cgagtactgc tgccagcgcc gcctcgacga ccgcatgtcc 
841 gacacctaca ccttctactc ggccgacgag ccgggcgtgc tgggccgcga cgaggcgcgc 
901 cccggcgggg ccgaggcggc tccctccagg cgctggctgg agcgcatgcg gcggaccttc 
961 gaggagccca cgtcgtcgct ggccgcgcag atcctggcta gcgtgtcggt ggtgttcgtg 
1021 atcgtgtcca tggtggtgct gtgcgccagc acgttgcccg actggcgcaa cgcagccgcc 
1081 gacaaccgca gcctggatga ccggagcagg ataattgaag ctatctgcat aggttggttc 
1141 actgccgagt gcatcgtgag gttcattgtc tccaaaaaca agtgtgagtt tgtcaagaga 
1201 cccctgaaca tcattgattt actggcaatc acgccgtatt acatctctgt gttgatgaca 
1261 gtgtttacag gcgagaactc tcaactccag agggctggag tcaccttgag ggtacttaga 
1321 atgatgagga ttttttgggt gattaagctt gcccgtcact tcattggtct tcagacactc 
13 81 ggtttgactc tcaaacgttg ctaccgagag atggttatgt tacttgtctt catttgtgtt 
1441 gccatggcaa tctttagtgc actttctcag cttcttgaac atgggctgga cctggaaaca 
1501 tccaacaagg actttaccag cattcctgct gcctgctggt gggtgattat ctctatgact 
1561 acagttggct atggagatat gtatcctatc acagtgcctg gaagaattct tggaggagtt 
1621 tgtgttgtca gtggaattgt tctattggca ttacctatca cttttatcta ccatagcttt 
1681 gtgcagtgtt atcatgagct caagtttaga tctgctaggt atagtaggag cctctccact 
1741 gaattcctga attaatgcat tgcaaatcaa ttcttgcata cacttcatag aaagactttg 
1801 atgctgcttc atatttatgt gtttcttgct gggtgagcac tgcagtggca ttgtcatcat 
1861 cttggtaggg taaaaattat ccttcccagc cgaagggata aaacagttta cttgttatgg 
1921 agtaaataga attgagactg caaaggaaga ataatgactc ctagagtaaa ctttaggacc 
1981 cggttttatt tagacttgtt ttcccgtttc cttgaatgat tacacatttt taaaaaatac 
2041 attatttgaa cattttaaaa cagaaaggta ctattttcca aatgtttttc catcttatga 
2101 attcagtaga agcttggaac ttatagtgtt ttttgtttga gagtaacatt ttcatttcta 
2161 aatgttttat aatttctcat atcaatgtca gaagtatcct ggaaacatat gtcacatgcg 
2221 ggaactgttt aacaaatact ttaaaaattt ggccaaaatt taaactgtat aatggagcta 
2281 gatacaagca agaatagtat ttgaaagact tttccagcat acttctcaat tctttgcttt 
2341 atttttgtgc caattattca ccttatcgtg ccgcttcatg gaagcttgag tatgttctcc 
2401 cttttccatt ttggatttat ctctttactg taatgactca aaaggtattt aagaattgac 
2461 gagagcttgt gttgtttagc atcttactgg ataatatttg aattcattgc tgttcctagg 
2521 tgataactgt cctaatattt agatgtccaa acaagaatac ttccaacata aaaattataa 
2581 taggaataat ttgagatgac tcaatattac aacctcttct tctcttaacc tcctccccca 
2641 aacactagag gtttaataag acttatcaga tgaaaggata tttatatagc cttttagtag 
2701 caaagtcata cttacgtgtt gtcactggat tatcataaaa gggagaaatt aaatattact 
2761 gtactcttag ttgctgtgta gctaagtcaa ttttaagcca gtaaaagcga tggatacata 
2821 atgatttgat ctgatcttta actattgtga atcacagcta caccaaaact cttcttgtaa 
2881 gaatactgac taatatgcca tgttaatctg gctagattat taggactaga taatgtaaaa 
2941 gtgatgattg tttagtaact aaattttagc aacagaaatt agaattttgc tttttcaacc 
3001 agttaccata aagaagttag tgtatatata aacacaaata attagtgaca gattcataaa 
3061 aaattgaatg ttgtacacag taattttgtc agaggtagag aagacaggga ttgggaagtg 
3121 gtgggtgatg gaggacctgg atatatttat caaataaagg gttaccagaa gtgttcatta 
3181 aaggaatttt agccatcatc tagttcaagc ctcaactatt acaggtagaa aatcagggca 
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324l ggagagaata taattgtgaa ggagtcaggg 

3301 agcaggttaa tcttcacaca tctctgggtt 

3361 ttgtcattgt catgctgagg taataatagc 

3421 cctaagctta tgtaatagtt tggccattaa 

3481 agtatgctac ttcttacata cccaaaagaa 

3541 tctaatataa ttgaaataaa tggcatggat 

3 601 tttgtggttc atgcaaacaa tgtgcagatg 
3661 gataaccatc 



ctaacacctg gatctccaga aacctagccc 
ctgagaaaag cctggaaaaa tcacacttct 
aaaactgttt tctttccctt aatttccttt 
atatcttgcc ctattttccc tattactgct 
attcagttat ttattgtata tttattgtat 
ttattttttc ttaactattt ggattaaagc 
atagcacctc catattacta ataaaaatat 
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>NM_172344 ACCESSION : NM_1 7 23 44 NID : gi 27436992 ref NM_172344.1 Homo 
sapiens potassium voltage-gated channel, subfamily G, 
member 3 {KCNG3) , transcript variant 2, mRNA 
Length = 3791 

Score = 867 bits (2215), Expect = 0.0 

Identities = 425/425 (100%), Positives = 425/425 (100%) 
Frame = +3 

Query : 1 MTFGRSGAASWLNVGGARYSLSRELLKDFPLRRVSRLHGCRSERDVLEVCDDYDRERNE 60 

MTFGRSGAASWLNVGGARYSLSRELLKDFPLRRVSRLHGCRSERDVLEVCDDYDRERNE 
Sbjct: 597 MTFGRSGAASWLNVGGARYSLSRELLKDFPLRRVSRLHGCRSERDVLEVCDDYDRERNE 776 

Query : 6 1 YFFDRHSEAFGFILLYVRGHGKLRFAPRMCELSFYNEMIYWGLEGAHIiEYCCQRRLDDRM 120 

YFFDRHSEAFGFILLYVRGHGKLRFAPRMCELSFYNEMIYWGLEGAHIiEYCCQRRLDDRM 
Sbjct : 777 YFFDRHSEAFGFILLYVRGHGKLRFAPRMCELSFYNEMIYWGLEGAHLEYCCQRRLDDRM 956 

Query : 121 SDTYTFYSADEPGVLGRDEARPGGAEAAPSRRWLERMRRTFEEPTSSLAAQILASVSWF 180 

SDTYTFYSADEPGVIiGRDEARPGGAEAAPSRRWLERMRRTFEEPTSSLAAQILASVSWF 
Sbjct : 957 SDTYTFYSADEPGVLGRDEARPGGAEAAPSRRWLERMRRTFEEPTSSLAAQILASVSWF 113 6 

Query : 181 VIVSMWLCASTLPDWRNAAADNRSLDDRSRIIEAICIGWFTAECIVRFIVSKNKCEFVK 240 

VIVSMWLCASTLPDWRNAAADNRSLDDRSRIIEAICIGWFTAECIVRFIVSKNKCEFVK 
Sbjct: 1137 VIVSiyrWIiCASTLPDWRNAAADNRSLDDRSRIIEAICIGWFTAECIVRFIVSKNKCEFVK 1316 

Query : 241 RPLNIIDLLAITPYYISVLMTVFTGENSQLQRAGVTLRVLRMMRIFWVIKLARHFIGLQT 3 00 

RPLNIIDLLAITPYYISVLMTVFTGENSQLQRAGVTLRVLRMMRIFWVIKLARHFIGLQT 
Sbjct: 1317 RPLNIIDLLAITPYYISVLMTVFTGENSQLQRAGVTLRVLRMMRIFWVXKLARHFIGLQT 1496 

Query: 30.1 LGLTLKRCYREMVMLLVFICVAMAIFSALSQLLEHGLDLETSNKDFTSIPAACWWIISM 360 

. LGLTLKRCYREMVMLLVFICVAMAIFSALSQLLEHGLDLETSNKDFTSIPAACWWVIISM 
Sbjct: 1497 LGLTLKRCYREMVMLLVFICVAMAIFSALSQLLEHGLDLETSNKDFTSIPAACWWVIISM 1676 

Query: 361 TTVGYGDMYPITVPGRILGGVCWSGIVLLALPITFIYHSFVQCYHELKFRSARYSRSLS 420 

TTVGYGDMYPITVPGRILGGVCWSGIVLLALPITFIYHSFVQCYHELKFRSARYSRSLS 
Sbjct: 1677 TTVGYGDMYPITVPGRILGGVCWSGIVLLALPITFIYHSFVQCYHELKFRSARYSRSLS 1856 

Query: 421 TEFLN 42 5 
TEFLN 

Sbjct: 1857 TEFLN 1871 
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LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 
PUBMED 

REFERENCE 
AUTHORS 
TITLE 



JOURNAL 
MEDLINE 
PUBMED 
REMARK 



REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REMARK 



COMMENT 



NM_172344 3791 bp mRNA linear PRI 20-DEC-2004 

Homo sapiens potassivim voltage-gated channel, subfamily G, member 3 
(KCNG3), transcript variant 2, mRNA. 
NM_172344 

NM_172344.1 GI: 27436992 

Homo sapiens (human) 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 3791) 
Vega- Saenz de Miera,E.C. 

Modification of Kv2 . 1 K+ currents by the silent KvlO subunits 
Brain Res. Mol . Brain Res. 123 (1-2), 91-103 (2004) 

1 5046870 

2 (bases 1 to 3791) 

Ottschytsch,N. , Raes, A. > Van Hoorick,D. and Snyders , D . J .. 
Obligatory heterotetramerization of three previously 
uncharacterized Kv channel alpha- subunits identified in the hioman 

genome 

Proc, Natl. Acad. Sci . U.S.A. 99 (12), 7986-7991 (2002) 

22056098 

12060745 

GeneRIF: Obligatory heterotetramerization of three previously 
uncharacterized Kv channel subunits identified in hximan genome 
(Kv6.3) (KvlO.l) (Kvll.l) 

GeneRIF: Obligatory heterotetramerization of three previously 
uncharacterized Kv channel subunits identified in human genome 

3 (bases 1 to 3791) 

Sano,Y., Mochizuki , S . , Miyake,A., Kitada,C., Inamura,K., Yokoi,H., 
Nozawa,K., Matsushime , H . and Furuichi,K. 

Molecular cloning and characterization of Kv6 . 3 , a novel modulatory 

subunit for voltage-gated K(+) channel Kv2 . 1 

FEBS Lett. 512 (1-3), 230-234 (2002) 

21841130 

11852086 



novel member of the 
modulatory subunit 



GeneRIF: These results indicate that Kv6 . 3 is a 
voltage-gated K(+) channel which functions as a 
of the Kv2 . 1 channel . 

REVIEWED REFSEQ : This record has been curated by NCBI staff. The 
reference sequence was derived from AF454547 . 1 and AF348982 . 1 . 

Summary: Voltage-gated potassium (Kv) channels represent the most 
complex class of voltage-gated ion channels from both functional 
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\ ' and structural standpoints. Their diverse functions include 

regulating neurotransmitter release, heart rate, insulin secretion, 
neuronal excitability, epithelial electrolyte transport, smooth 
muscle contraction, and cell volume. This gene encodes a member of 
the potassium channel, voltage-gated, subfamily G. This member is a 
gamma subunit functioning as a modulatory molecule. Alternative 
splicing results in two transcript variants encoding distinct 
isof orms . 

Transcript Variant: This variant (2), also known as KvlO.la, lacks 
an alternate in- frame segment in the coding region, as compared to 
variant 1. It encodes isoform (2) that lacks an internal segment, 
as compared to isoform 1 . 
COMPLETENESS: complete on the 3' end. 
FEATURES Location/Qualifiers 
source 1 . .3791 

/organism="Homo sapiens" 
/ mo l_type = " mRNA " 
/ db_xr e f = " t axon : 9 6 0 6 " 
/chromosome= " 2 " 
/map="2p21" 
gene 1..3791 

/gene=" KCNG3 " 

/note=" synonyms: KV6 . 3 , KVlO.l" 
/db_xref = "GenelD : 170850 " 
/ db_xref = "MIM : 606767 " 
CDS 597. .1874 

/gene="KCNG3" 

/note=" isoform 2 is encoded by transcript variant 2 ; 
voltage-gated potassium channel 6.3; voltage-gated 
potassium channel KvlO.l; voltage-gated potassium channel 
subunit Kv6.4; 

go_component : plasma membrane [goid 0005886 ] [evidence 
IDA] [pmid 12060745] ; 

go_component : integral to membrane [goid 0016021 ] 
[evidence lEA] ; 

go_component : endoplasmic reticulum [goid 0005783 ] 
[evidence IDA] [pmid 12060745]; 

go_component : voltage-gated potassium channel complex 
[goid 0008076 ] [evidence lEA] ; 

go_f unction: protein binding [goid 0005515 ] [evidence IPX] 
[pmid 11852086] ; 

go_f unction: voltage-gated potassium channel activity 
[goid 0005249 ] [evidence lEA] ; 

go_process: cation transport [goid 0006812 ] [evidence 
lEA] ; 

go_process: potassium ion transport [goid 0006813 ] 

[evidence lEA] " 
/codon_start=l 

/product= "potassium voltage-gated channel, subfamily G, 

member 3 isoform 2" 

/protein_id= " NP_758847 . 1 " 

/db_xref ="GI : 27436993 " 

/ db_xr e f = " Gene I D : 170850 " 

/db_xref = " MIM : 606767 " 

/ translat ion= " MTFGRSGAASWLNVGGARYSLSRELLKDFPLRRVSRLHGCRSE 
RDVLEVCDDYDRERNEYFFDRHSEAFGFILIiYVRGHGKLRFAPRMCELSFYNEMIYWG 
LEGAHLEYCCQRRLDDRMSDTYTFYSADEPGVLGRDEARPGGAEAAPSRRWLERMRRT 
FEEPTSSLAAQILASVSWFVIVSMWLCASTLPDWRNAAADNRSLDDRSRIIEAICI 
GWFTAECIVRFIVSKNKCEFVKRPLNIIDLLAITPYYISVLMTVFTGENSQLQRAGVT 
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polyA_signal 
polyA_site 



LRVLRMMRIFWIKLARHFIGLQTLGLTIiKRCYREMVMLLVFICVAMAIFSALSQLLE 

HGLDLETSNKDFTSIPAACWWVIISMTTVGYGDMYPITVPGRILGGVCWSGIVIiLAL 

PITFIYHSFVQCYHELKFRSARYSRSLSTEFLN" 

3769. .3774 

/gene="KCNG3" 
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gcggcgcgga 
agaggccccg 
gcctctcgcc 
gagacgggga 
ggccccgcgg 
gtcggcggcg 
ggactccccg 
gatactcgag 
gcgggccccg 
tggggtgggg 
ccttcgggcg 
tgtcccggga 
gctccgagcg 
tcttcgaccg 
agctgcgctt 
gcctggaggg 
acacctacac 
ccggcggggc 
aggagcccac 
tcgtgtccat 
acaaccgcag 
ctgccgagtg 
ccctgaacat 
tgtttacagg 
tgatgaggat 
gtttgactct 
ccatggcaat 
ccaacaagga 
cagttggcta 
gtgttgtcag 
tgcagtgtta 
aattcctgaa 
tgctgcttca 
ttggtagggt 
gtaaatagaa 
ggttttattt 
ttatttgaac 
ttcagaagaa 
atgttttata 
gaactgttta 
atacaagcaa 
tttttgtgcc 
ttttccattt 
agagcttgtg 
gataactgtc 
aggaataatt 
acactagagg 
aaagtcatac 
tactcttagt 
tgatttgatc 
aatactgact 
tgatgattgt 



gggaggtgag 
cccccgccgc 
tctggacggc 
gctattccgc 
cttgggagag 
cggacacagc 
gccacccgca 
cccagggacc 
acggcgcgct 
ggctgcagcg 
cagcggggcg 
gctgctgaag 
cgacgtgctc 
gcactcggag 
cgcgccgcgg 
cgcgcacctc 
cttctactcg 
cgaggcggct 
gtcgtcgctg 
ggtggtgctg 
cctggatgac 
catcgtgagg 
cattgattta 
cgagaactct 
tttttgggtg 
caaacgttgc 
ctttagtgca 
ctttaccagc 
tggagatatg 
tggaattgtt 
tcatgagctc 
ttaatgcatt 
tatttatgtg 
aaaaattatc 
ttgagactgc 
agacttgttt 
attttaaaac 
gcttggaact 
atttctcata 
acaaatactt 
gaatagtatt 
aattattcac 
tggatttatc 
ttgtttagca 
ctaatattta 
tgagatgact 
tttaataaga 
ttacgtgttg 
tgctgtgtag 
tgatctttaa 
aatatgccat 
ttagtaacta 



cggcgcgcgc 
cgcgagccgg 
ggcggggcgg 
cccggcggct 
ggcaccgcgg 
cgcgttccca 
ggtaccgccg 
cccgggccag 
cccccgtcag 
gggccctcgg 
gcctcggtgg 
gacttcccgc 
gaggtgtgcg 
gccttcggct 
atgtgcgagc 
gagtactgct 
gccgacgagc 
ccctccaggc 
gccgcgcaga 
tgcgccagca 
cggagcagga 
ttcattgtct 
ctggcaatca 
caactccaga 
attaagcttg 
taccgagaga 
ctttctcagc 
attcctgctg 
tatcctatca 
ctattggcat 
aagtttagat 
gcaaatcaat 
tttcttgctg 
cttcccagcc 
aaaggaagaa 
tcccgtttcc 
agaaaggtac 
tatagtgttt 
tcaatgtcag 
taaaaatttg 
tgaaagactt 
cttatcgtgc 
tctttactgt 
tcttactgga 
gatgtccaaa 
caatattaca 
cttatcagat 
tcactggatt 
ctaagtcaat 
ctattgtgaa 
gttaatctgg 
aattttagca 



ggagccggcg 
ctcttcgccg 
ccgccggatt 
ccattcggcg 
cctcggtgtg 
gccggtgggg 
ggcggagggc 
cggagggcag 
ccacgggcag 
gccgaaagtc 
tgctgaacgt 
tgcgccgcgt 
acgactacga 
tcatcctgct 
tctccttcta 
gccagcgccg 
cgggcgtgct 
gctggctgga 
tcctggctag 
cgttgcccga 
taattgaagc 
ccaaaaacaa 
cgccgtatta 
gggctggagt 
cccgtcactt 
tggttatgtt 
ttcttgaaca 
cctgctggtg 
cagtgcctgg 
tacctatcac 
ctgctaggta 
tcttgcatac 
ggtgagcact 
gaagggataa 
taatgactcc 
ttgaatgatt 
tattttccaa 
tttgtttgag 
aagtatcctg 
gccaaaattt 
ttccagcata 
cgcttcatgg 
aatgactcaa 
taatatttga 
caagaatact 
acctcttctt 
gaaaggatat 
atcataaaag 
tttaagccag 
tcacagctac 
ctagattatt 
acagaaatta 



ggcgaggagg 
cctccgaacc 
cgcggccgca 
cccgcagccc 
cgcagccctc 
ctcagcgctg 
gcgctactag 
gagcggagcc 
gcaggccccg 
ccccgggcgg 
gggcggcgcc 
gagccggctg 
ccgcgagcgc 
ctacgtgcgc 
caacgagatg 
cctcgacgac 
gggccgcgac 
gcgcatgcgg 
cgtgtcggtg 
ctggcgcaac 
tatc tgcata 
gtgtgagttt 
catctctgtg 
caccttgagg 
cattggtctt 
acttgtcttc 
tgggctggac 
ggtgattatc 
aagaattctt 
ttttatctac 
tagtaggagc 
acttcataga 
gcagtggcat 
aacagtttac 
tagagtaaac 
acacattttt 
atgtttttcc 
agtaacattt 
gaaacatatg 
aaactgtata 
cttctcaatt 
aagcttgagt 
aaggtattta 
attcattgct 
tccaacataa 
ctcttaacct 
ttatatagcc 
ggagaaatta 
taaaagcgat 
accaaaactc 
aggactagat 
gaattttgct 



aggactgcac 
cgctcacttt 
gggagcgccg 
tcagggggtc 
gggcgcgagg 
gcgccggcga 
cagcgccgga 
ccgagggagc 
cgtggcggct 
ccagccatga 
cggtattcgc 
cacggctgcc 
aacgagtact 
ggccacggca 
atctactggg 
cgcatgtccg 
gaggcgcgcc 
cggaccttcg 
gtgttcgtga 
gcagccgccg 
ggttggttca 
gtcaagagac 
ttgatgacag 
gtacttagaa 
cagacactcg 
atttgtgttg 
ctggaaacat 
tctatgacta 
ggaggagttt 
catagctttg 
ctctccactg 
aagactttga 
tgtcatcatc 
ttgttatgga 
tttaggaccc 
aaaaaataca 
atcttatgaa 
tcatttctaa 
tcacatgcgg 
atggagctag 
ctttgcttta 
atgttctccc 
agaattgacg 
gttcctaggt 
aaattataat 
cctcccccaa 
ttttagtagc 
aatattactg 
ggatacataa 
ttcttgtaag 
aatgtaaaag 
ttttcaacca 
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3121 gttaccataa agaagttagt gtatatataa acacaaataa ttagtgacag attcataaaa 

3181 aattgaatgt tgtacacagt aattttgtca gaggtagaga agacagggat tgggaagtgg 

3241 tgggtgatgg aggacctgga tatatttatc aaataaaggg ttaccagaag tgttcattaa 

3301 aggaatttta gccatcatct agttcaaacc tcaactatta caggtagaaa atcagggcag 

33 61 gagagaatat aattgtgaag gagtcagggc taacacctgg atctccagaa acctagccca 

3421 gcaggttaat cttcacacat ctctgggttc tgagaaaagc ctggaaaaat cacacttctt 

3481 tgtcattgtc atgctgaggt aataatagca aaactgtttt ctttccctta atttcctttc 

3541 ctaagcttat gtaatagttt ggccattaaa tatcttgccc tattttccct attactgcta 

3601 gtatgctact tcttacatac ccaaaagaaa ttcagttatt tattgtatat ttattgtatt 

3661 ctaatataat tgaaataaat ggcatggatt tattttttct taactatttg gattaaagct 

3721 ttgtggttca tgcaaacaat gtgcagatga tagcacctcc atattactaa taaaaatatg 

3781 ataaccatca a 
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Molecular cloning and characterization of Ky6.3, a novel 
modulatory subunit for voltage-gated K(+) channel Kv2.1. 

Sano Y, Mochizuki S, Miyake A, Kitada C, Inamura K, Yokoi H, Nozawa 
K, Matsushime H, Furuichi K. 

Molecular Medicine Laboratories, Institute for Drug Discovery Research, 
Yamanouchi Pharmaceutical Co., Ltd., 21 Miyukigaoka, Ibaraki 305-8585, 
Tsukuba, Japan, sano.yorikata@yamanouchi.co.jp 

We report identification and characterization of Kv6.3, a novel member of the 
voltage-gated K(+) channel. Reverse transcriptase-polymerase chain reaction 
analysis indicated that Kv6.3 was highly expressed in the brain. 
Electrophysiological studies indicated that homomultimeric Kv6.3 did not yield 
a functional voltage-gated ion channel. When Kv6.3 and Kv2.1 were co- 
expressed, the heteromultimeric channels displayed the decreased rate of 
deactivation compared to the homomultimeric Kv2.I channels. 
Immunoprecipitation studies indicated that Kv6.3 bound with Kv2.1 in co- 
transfected cells. These results indicate that Kv6.3 is a novel member of the 
voltage-gated K(+) channel which functions as a modulatory subunit. 

MeSH Terms: 

• Amino Acid Sequence 

• Cloning, Molecular 

• Electric Conductivity 

• Humans 

• Ion Channel Gating 

• Molecular Sequence Data 

• Potassium Channels/classification 

• Potassium Channels/genetics 

• Potassium Channels/metabolism* 

• Potassium Channels, Voltage-Gated* 

• Protein Subunits 

• Sequence Homology, Amino Acid 

• Tissue Distribution 
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Substances: 

• Potassium Channels 

• Potassium Channels, Voltage-Gated 

• Protein Subunits 

• delayed rectifier potassium channel 

• potassium channel Kv6.3 

Secondary Source ID: 

• GENBANK/AB070604 

• GENBANK/AB070605 

PMID: 1 1852086 [PubMed - indexed for MEDLINE] 
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Obligatory heterotetramerization of three previously 
uncharacterized Kv channel a-subunits identified 
in the human genome 

N. Ottschytsch, A. Raes, D. Van Hoorick, and D. J. Snyders* 

Laboratory for Molecular Biophysics. Physiology, and Pharmacology, University of Antwerp (UIA) and Flanders Institute for Biotechnology (VIB). 
B2610 Antwerp, Belgium 

Edited by Lily Y. Jan, University of California School of Medicine, San Francisco, CA, and approved April 12, 2002 (received for review November 20. 2001) 



Voltage-gated K-*- channels control excitability in neuronal and 
various other tissues. We identified three unique o-subunits of 
voltage-gated K'^-channels in the human genome. Analysis of the 
full-length sequences indicated that one represents a previously 
uncharacterized member of the Kv6 subfamily, Kv6.3, vs/hereas the 
others are the first members of two unique subfamilies, Kv10.1 and 
Kv11.1. Although they have all of the hallmarks of voltage-gated 
K"*" channel subunits, they did not produce K*^ currents when 
expressed in mammalian cells. Confocal microscopy showed that 
Kv6.3, Kv10.1, and Kv11.1 alone did not reach the plasma mem- 
brane, but were retained in the endoplasmic reticulum. Yeast 
two-hybrid experiments failed to show homotetrameric interac- 
tions, but showed interactions with Kv2.1, Kv3.1, and Kv5.1. 
Co-expression of each of the previously uncharacterized subunits 
with Kv2.1 resulted in plasma membrane localization with currents 
that differed from typical Kv2.1 currents. This heteromerization 
was confirmed by co-immunoprecipitation. The Kv2 subfamily 
consists of only two members and uses interaction with "silent 
subunits" to diversify its function. Including the subunits described 
here, the "silent subunits" represent one-third of all Kv subunits, 
suggesting that obligatory heterotetramer formation Is more 
widespread than previously thought. 

electrically silent subunits \ ER retention | heterotetrameric 
assembly | KCNG3 

Voltage-gated potassium channels are transmembrane pro- 
teins consisting of four a-subunits that form a central 
permeation pathway. Each subunit contains six transmembrane 
domains (S1-S6) and a pore loop containing the GYG-motif, the 
signature sequence for potassium selectivity. The fourth trans- 
membrane domain (S4) contains positively charged residues and 
is the major part of the voltage sensor. Voltage-gated potassium 
channels serve a wide range of functions including regulation of 
the resting membrane potential and control of the shape, 
duration, and frequency of action potentials (1-3). 

At present, 26 genes have been described encoding for dif- 
ferent Kv a-subunits. These are divided into subfamilies by 
sequence similarities: within a subfamily members share '^70% 
of sequence identity, whereas between different subfamilies this 
percentage drops to ^40%, reflecting the homology in the core 
section S1-S6 (4). The Kv family of potassium channels consists 
of nine subfamilies, Kvl through Kv9, although Kv7 has only 
been described for Aplysia (5). The subunits of the Kvl through 
Kv4 subfamilies all show functional expression in a homotet- 
rameric configuration. Despite having the typical topology of 
voltage-gated potassium channel subunits, the subunits of the 
Kv5 through Kv9 families cannot generate current by themselves 
(6-10). For instance, Kv6.1 fails to form homotetrameric chan- 
nels, but it is able to form heterotetrameric channels with Kv2.1; 
expression of these heterotetramers resulted in currents with 
clearly distinguishable properties (11). All known "electrically 
silent" subunits have been shown to form heterotetrameric 
channels with the members of the Kv2 subfamily (8-10). In a 



sense, these "silent" subunits can be considered regulatory 
subunits — e.g., the metabolic regulation of the Kv2.1/Kv9.3 
heteromultimer might play an important role in hypoxic pulmo- 
nary artery vasoconstriction and in the possible development of 
pulmonary hypertension (8). 

In this study we report the cloning and functional properties 
of three previously uncharacterized subunits that were identified 
in the early public draft version of the human genome. Based on 
sequence identity, one of these is a previously uncharacterized 
member of the Kv6 subfamily (Kv6.3), whereas the others are the 
first members of two unique subfamilies, Kvl 0.1 and Kvl 1.1. 
Biochemical, microscopic, and functional analysis indicated that 
these previously uncharacterized subunits are all "silent sub- 
units," which may explain why they have not been cloned 
previously. Through obligatory heterotetramerization they exert 
a function-altering effect on other Kv subunits. 

Experimental Procedures 

Cloning of Kv2.1, Kv6.3, KvlO.1. and Kvll.1, The coding sequence of 
human Kv2.1 was amplified from a human brain library (CLON- 
TECH) and cloned into pEGFP-Nl. The channel sequences of 
Kv6,3, KvlO.l, and Kvl 1.1 were obtained through a BLAST search 
of the high throughput genomic sequence (htgs) database (July 
2000). The coding sequences were cloned using PGR amplifica- 
tion from a human brain library (CLONTECH) or a human 
testis library (TaKaRa, Shiga, Japan) for KvlO.l and Kvll.l, 
respectively. Both coding exons of Kv6.3 were amplified from 
human genomic DNA. The BsaMI restriction site at the start of 
the second exon was used to join the two coding exons. 

Amino Acid Sequence Alignments and Phylogenetic Tree. Computer 
analyses were performed using megalign (DNAstar, Madison, 
WI). The phylogenetic tree and the percentage of identity were 
obtained by aligning the core S1-S6 sequences (e.g., aa 252-518 
in Kvl.5). 

Expression Analysis. A cDNA panel from different tissues was 
obtained from CLONTECH (cDNA panel I and II). PCR was 
performed with primer sets that were selected to ensure the 
amplification of the correct subunit, without amplification of 
homologous subunits. All reactions were done with 38 cycles and 
PCR products were analyzed on a 2% agarose gel. 

Transfection. Ltk~ cells were cultured and transfected with cDNA 
as reported (12). Each subunit was coexpressed with Kv2.1 (10:1 



This paper was submitted directly (Track 11} to ^e PNAS office. 

Abbreviations: ER, endoplasmic reticulum; GFP, green fluorescent protein. 

Data deposition: The sequences reported in this paper have been deposited In the GenBanIc 
database (accession nos. AF348982-AF348984}. 
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werp (UIA). Universiteitsplein 1, T4.21, 2610 Antwerp, Belgium. E-mail: dlrk.snydersd 
ua.ac.be. 
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Fig.1. Sequence alignment, phylogenetictree, and percent sequence identity of Kv63,Kvl0.1, and Kvll J. (A) The amino ad^ 

Kv63,.Kv10.1. and Kvl 1.1 were aligned using megaugn. For convenience, only the first 460 aa of Kv2.1 are shown. Gaps (indicated by dashes) were introduced 
in the sequence to maintain the alignment. Conserved amino acids are shaded in gray. The six putative transmembrane domains and the pore region are indicated 
by an overline. (6) The phylogenetic tree for the Kv family. (O The percent sequence similarity based on the S1-S6 core. 



ratio). At this ratio, less than 0.01% of the channels will be 
wild-type Kv2.1. Between 12 and 24 h post-transfection the cells 
were trypsinized and used for analysis. 

Whole-Cell Current Recording. Current recordings were made with 
an Axopatch-200B amplifier (Axon instruments. Union City, 
CA) in the whole cell configuration of the patch-clamp technique 
(13) as reported (12). 

Pulse Protocols and Data Analysis. The applied pulse protocols are 
listed in the figure legends. The voltage dependence of channel 
opening and inactivation (activation and inactivation curves) was 
fitted with a Boltzmann equation according to y = 1/{1 + 
€xp\—{E — K'/f)//:]}, where V^a represents the voltage at which 
50% of the channels are open or inactivated and k the slope 
factor. Activation and deactivation kinetics were fitted with a 
single or double exponential function by using a nonlinear 
least-squares (Gauss-Newton) algorithm. Results are presented 
as mean ± SEM; statistical analysis was done using the Student's 
t test; probability values are presented in the text. 

Yeast Two-Hybrid System and Protein Constructs. The MATCH- 
MAKER Yeast Two-Hybrid System 3 (CLONTECH) was used 
to assay for protein-protein interactions. The amino termini of 
Kvl.5, Kv2,l, Kv3.1, Kv4.3, Kv5.1, Kv6.1. Kv6.3, Kv8.1, Kv9.3, 
KvlO.l, and Kvll.l were cloned into the vector pGBKT7. The 



amino termini of Kv2.1, Kv6.3, KvlO.l, and Kvll.l were also 
cloned into the vector pGADT7. AH109 cells were transformed 
with the plasmid constructs of interest (100 ng of each) and 
plated on - Trp/- Leu/ +XaG AL media to select for cells 
containing both vectors and to test for interaction. The degree 
of interaction was determined from the speed and intensity of 
the blue color development. 

Coimmunoprecipitation. Kv2.1 was c-myc-tagged at the C terminus 
and cotransfected with green fluorescent protein (GFP)-tagged. 
Kv2.1, Kvl.5, Kv6.3, KvlO.l, or Kvll.l into HEK293 cells. The 
next day the cells were solubilized on ice with a PBS buffer 
supplemented with 5 mM EDTA, 1% Triton X-100, and a 
complete protease inhibitor mixture (Roche Diagnostics). For 
the immunoprecipitation Protein G Agarose beads and 2 /Ltg of 
anti-GFP (CLONTECH) were added. The samples were incu- 
bated overnight at 4°C with rocking. Beads were then washed 
with ice-cold solubilization buffer. Proteins were eluted from the 
beads by boiling in SDS sample buffer and analyzed on 8% 
SDS/PAGE, Proteins were transferred to a polyvinylidene 
dif luoride (PVDF) membrane (Amersham Pharmacia Biotech) 
and the blot was blocked. The blot was incubated with anti-c-myc 
(CLONTECH); afterwards, anti-mouse IgG (Amersham Phar- 
macia Biotech) was added, followed by ECL detection (Amer- 
sham Pharmacia Biotech). 
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Fig. 2. Expression of Kv6.3, KvlO.1, Kvl 1.1, and Kv2.1 in human tissues. A PCR 
analysis was performed on a cDNA panel of the Indicated human tissues with 
gene-specific primers for the subunits indicated on the left. 



Confocal Imaging. Kv6.3, KvlO.l, and Kvl 1.1 were tagged with 
GFP at their carboxy terminus. HEK293 cells were cultivated on 
coverslips. For cotransfections, a 1:10 ratio of channel DNA 
versus Kv2.1 was added. The endoplasmic reticulum (ER) was 
visualized with the DsRed ER localization vector. This was 
constructed starting from the pDsRed vector (CLONTECH). 
The first 17 aa from calreticulin were amplified from brain 
cDNA and cloned in frame with the DsRed sequence of pDsRed. 
The KDEL sequence was inserted behind DsRed by using a 
mutagenesis PCR. Transfections and cotransfections (ratio 1:10 
GFP-labeled channel DNA versus unlabeled Kv2,l DNA) were 
done using the lipofectamine method (see above). Confocal 
images were obtained on a Zeiss CLSM 510, equipped with an 
argon laser (excitation, 488 nm) for the visualization of GFP and 
DsRed. 

Results 

Cloning of Kv6.3, KvlO.l, and Kvl 1.1. A search of the GenBank high 
throughput genomic sequence (htgs) database revealed genomic 
contigs containing exons coding for three previously uncharac- 
terized homologues of Kv channels. The sequences of the 
genomic contigs were analyzed using genefinder to determine 
the full coding sequences of the genes. The predicted proteins 
displayed the typical topology of a Kv subunit: six transmem- 
brane segments (S1~S6), with an array of five to six positive 
charges in S4, and the potassium selectivity motif *'GYG" in the 
P-loop between S5 and S6 (Fig. 1). Each gene was predicted to 
consist of two coding exons, without evidence for alternate 
splicing. 

One of the predicted proteins consisted of 519 aa and shared 
more than 70% sequence identity with Kv6.1 and Kv6.2 (Fig. 1), 
Therefore, this protein has to be regarded as a previously 
uncharacterized member of the Kv6 subfamily, Kv6.3 or 
KCNG3, The other two proteins were composed of 436 and 545 
aa and shared only «='40% sequence identity with any of the 
previously identified Kv subunits. Therefore, we classified them 
as the first members of two previously uncharacterized subfam- 
ilies, KvlO.l and Kvl 1.1. 

The chromosomal locations of the genomic contigs containing 
Kv6.3, KvlO.l, and Kvl 1.1 are 16q24.1, 2p21, and 9p24.2, 
respectively. The complete cDNA sequences have been submit- 
ted to the GenBank database under accession nos. AF348982, 
AF348983, and AF348984 for KvlO.l, Kvll.l, and Kv6.3, 
respectively. 
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Fig. 3. Whole-cell current recordings of Kv63, KvlO.l, and Kvll.l, and the 
cotransfections with Kv2.1. The top sections In each panel show typical re- 
cordings for untransfected Ltk" cells (A), or for cells expressing Kv6.3 (B), 
KvlO.l (O, and Kvll.l (D). The holding potential was -80 mV and cells were 
depolarized in 20-mV increments from -80 mVto + 60 mV. 500 ms in duration, 
followed by a repolarizing pulse at -25 mV, 850 ms In duration. The bottom 
sections of each panel show typical recordings from Ltk" cells expressing Kv2.1 
(A), Kv2.1 + Kv6.3 (B), Kv2.1 + KvlO.l (O. and Kv2.1 + Kvll.l (D), The holding 
potential was -80 mV and cells were depolarized by 10-mV increments from 
-60 mV to +70 mV, 500 ms in duration. Deactivating tails were recorded at 
-25 mV or -35 mV for 850 ms. 



Tissue Distribution of Kv6.3, KvlO.l, and Kvll.l. The search of the 
GenBank EST database yielded several hits for all three se- 
quences, indicating that their mRNAs are indeed expressed. 
PCR analysis was used to assess the expression of Kv6.3, KvlO.l, 
and Kvll.l mRNA in various human tissues (Fig. 2). Kv6.3 
showed strong expression in brain and low expression in liver, 
small intestine, and colon. KvlO.l was strongly expressed in 
pancreas and testis and weakly in brain, lung, kidney, thymus, 
ovary, small intestine, and colon. Kvll.l gave a strong signal in 
pancreas and testis and a weaker signal in lung, liver, kidney, 
spleen, thymus, prostate, and ovary. 

Functional Expression of Kv6.3, KvlO.l, and Kvll.l In lXk~ Cells. The 
coding sequences of Kv6.3, KvlO.l, and Kvll.l were cloned into 
mammalian expression vectors for transient transfection in Ltk~ 
cells. The subunits Kv6.3, KvlO.l, and Kvll.l each failed to 
generate current above background in these cells, as shown in the 
top sections of each panel in Fig. 3 (n > 10 cells, for at least two 
independent transfections for each clone). Previously discovered 
silent subunits can form heterotetrameric channels with the Kv2 
subfamily (6-10). To test whether this could also be the case for 
the previously uncharacterized subunits, we performed coex- 
pressions with Kv2.1. 

Expression of the human Kv2.1 subunit alone resulted in a 
typical rapidly activating delayed outward rectifier K+ current 
with functional properties as described (14, 15). The bottom 
sections of each panel in Fig. 3 show that coexpression with 
either previously uncharacterized subunit resulted in currents 
with distinct properties. For the cotransfection of Kv2.1 with 
Kv6.3, the threshold for activation was shifted by approximately 
20 mV in hyperpolarizing direction compared with Kv2.1 alone 
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Fig. 4. (iA) Voltage dependence of activation. The activation curves of Kv2.1, 
Kv2.1 + Kv6.3, Kv2.1 + KvlO.1. and Kv2.1 + Kvll.1 were obtained from the 
normalized initial tail amplitude recorded at -25 mV for Kv2.1, Kv2.1 + 
KvlO.l, and Kv2.1 + Kvll.l or at -50 mV for Kv2.1 + Kv6.3 after 500-ms 
prepulses ranging from —60 mV to 70 mV in 10-mV steps. The solid line 
represents the Boltzmann function fitted to the experimental data (see Ex- 
perimentai Procedures). (B) Voltage dependence of inactivatlon. The inacti- 
vation curves of Kv2.1, Kv2.1 + Kv6.3, Kv2.1 + Kv 10.1, and Kv2.1 + Kvll.l 
were obtained from the normalized peak currents recorded during a 250-ms 
test pulse to 50 mV as a function of the 5-s prepulse ranging from -50 mV to 
10mVforKv2.1, Kv2.1 +Kv10.1,and Kv2.1 + Kvll.l and from -80mVto-20 
mV for Kv2.1 + Kv6.3. Experimental data were fitted with a Boltzmann 
function (solid lines). (O Kinetics of activation and deactivation of Kv2.1 and 
Kv2.1 + Kv6.3. Mean time constants ± SEM of activation and deactivation are 
plotted as a function of the test potential. To obtain the time constants for 
activation, test pulses were applied ranging from -10 mV to 70 mVfor Kv2.1 
and -30 mV to 70 mV for Kv2.1 + Kv6.3 in 10-mV steps, 500 ms in duration. 
To obtairi the time constants for deactivation, a 200-ms prepulse to 50 mV was 
followed by test pulses ranging from -20 mV to - 1 00 mV in 1 0-mV steps, 850 
ms in duration. The experimental data were fitted with mono- or double- 
exponential functions, as appropriate. The slow component of activation and 
deactivation are shown as triangles, whereas the fast components are shown 
as circles. WT Kv2.1 gating kinetics are connected with a solid line, (D) Kinetics 
of activation of Kv2.1, Kv2!l + KvlO.1, and Kv2.1 + Kvll.l. Mean time 
constants :t SEM of activation are shown as a function of the step potentials 
(- 10 mV to 70 mV). The pulse protocol for Kv2.1 + KvlO.1 and Kv2.H- Kvll.l 
is the same as for Kv2.1 alone in C. 



(Fig. 44). In addition, V*a (Table 1) was significantly (P < 0,001) 
shifted toward hyperpolarizing voltages, and the slope decreased 
as well (F < 0.001). Cotransfection of Kv2.1 with KvlO.l had no 
significant (P > 0.05) effect on the activation curve, whereas with 
Kvll.l a small but consistent (P < 0.05) -5 mV shift was 
observed. 

Co-transfection of Kv2.1 with Kv6.3 also markedly changed 
the C-type inactivation (Fig. 4B): the cotransfection resulted in 

Table 1. Electrophysiological parameters 
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Fig. 5. Interaction of Kv6. 3, KvlO.l, and Kvll.l with representative subunits 
of all Kv subfamilies. The intracellular N-termlnal segment that contains the 
subfamily-specific NAB domain was used as bait (B) and /or target (T) in a yeast 
two-hybrid analysis. 



a 40-mV hyperpolarizing shift in the voltage dependence of 
inactivation (P < 0.001). Cotransfection of KvlO.l had no 
significant effect on the voltage dependence of inactivation (P > 
0.05), whereas Kvll.l gave a small -5 mV shift (P < 0.05). 

The time-course of activation of Kv2.1 was fitted with a 
monoexponential function and resulted in time constants shown 
in Fig. 4 C and D. Upon cotransfection with Kv6.3, activation was 
accelerated (/'values at all voltages <0.001) and the time course 
of activation was approximated better with a double exponential 
function (Fig. 4C). The acceleration of activation was less 
pronounced for the cotransfection of KvlO.l or Kvll.l (Fig. 4D), 
but still statistically significant (P < 0.05 at all voltages). 
Deactivation was fitted with a mono- or double-exponential 
function as appropriate. Cotransfection of Kv6.3 slowed deac- 
tivation significantly (P value at all voltages <0.001), whereas 
KvlO.l and Kvll.l had no significant effect {P value at all 
voltages >0.05). A summary of the electrophysiological param- 
eters is given in Table 1. 

Tissue Distribution of Kv2.1. To test whether the previously un- 
characterized subunits could be regulatory subunits for Kv2.1 in 
vivo, we determined the expression of Kv2,l with the same 
cDNA panels as we did for Kv6.3, KvlO.l, and Kvll.l (Fig. 2). 
Kv2.1 showed very high expression in brain, skeletal muscle, 
pancreas, and small intestine and moderate to high expression in 
heart, placenta, lung, liver, kidney, spleen, thymus, prostate, 
testis, ovary, and colon, consistent with previous reports (6, 16, 
17). These results show that Kv6.3, KvlO.l, and Kvll.l are 
expressed in several tissues in which Kv2.1 is also expressed, 
indicating that at least in some tissues these subunits could 
indeed interact with Kv2.1 to form heterotetrameric channels. 

Biochemical Evidence for Selective Interaction with Kv Subunits. To 
explore in a more unbiased manner the potential interactions of 
the three previously uncharacterized subunits with subunits of 
the known subfamilies, we used a yeast two-hybrid approach. 
Given the limitations of this method, we screened with the 
intracellular amino terminal segment, which contains the NAB 
domain that regulates coassembly (18-21). Kv6.3, KvlO.l, and 
Kvll.l each did not show interactions with themselves, nor with 
each other, Kvl.5, Kv4.3, Kv8.1, and Kv9.1 (Fig. 5). In contrast, 
a strong interaction with Kv2.1, Kv3.1, and Kv5.1 was seen. For 
each of the previously uncharacterized subunits this interaction 
was as strong as the interaction of Kv2.1 with itself (positive 
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Values are given as mean ± SEM; n = number of experiments. and k obtained from Boltzmann fit (see Experimental Procedures). N.A., not applicable. 
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Fig. 6. Co-lmmunoprecipltation of Kv6.3GFP. KvlO.IGFP, and Kvll.lGFP 
with Kv2.1c-myc. Immunoprecipitation was done with antl-GFP antibodies. 
Western blot was performed with anti-c-myc. Lanes 3-5 show that Kv2.1c-myc 
was coprecipitated with Kv6.3GFP. KvlO.IGFP, and Kvll.lGFP. GFP-tagged 
Kv2.1 (lane 1) and Kvl.5 (Iane2) were used as positive and negative controls, 
respectively. 



control). Kv2.1 failed to interact with Kvl.5, consistent with the 
known lack of heterotetramerization between Kvl.5 and Kv2.1 
(18, 22). The interaction of Kv2.1 with the previously unchar- 
acterized subunits was confirmed with coimmunoprecipitation, 
using the full-length proteins (Fig. 6). 

Subcellular Localization of Kv6.3, KvlO.1. and Kv11.1. Although the 
lack of N-terminal tetramerization might explain the lack of 
current, it is known that Kv2.1 can generate current when the 
NAB domain is removed (18), Therefore, we also determined 
the subcellular localization of the previously uncharacterized 
subunits by using confocal microscopy. To visualize the subcel- 
lular protein distribution, GFP was fused to their carboxy 
termini. Transfected cells expressing only Kv6.3, KvlO.l, or 
Kvll.l showed a punctated intracellular appearance without 
staining of the plasma membrane (Fig. 7, column 1). This 
indicates that the full-length protein was made, because GFP was 
added on the C-terminal end. To test whether this pattern 
reflected retention in the ER, we performed cotransfections 
with a vector (DsRed-ER) containing the cDNA from the red 
fluorescent protein DsRed, fused with the ER targeting signal 
from calreticulin and the ER retention signal, KDEL (Fig. 7, 
column 2). The localization of the red and green fluorescence 
. overlapped completely, resulting in a yellow-orange color indi- 
cating that each of the three subunits were retained in the ER 
when they were expressed alone (Fig. 7, column 3). When Kv2.1 
was coexpressed with these subunits, a redistribution of the 
green fluorescence was observed. In each case, prominent GFP 
staining was evident at the plasma membrane with minimal 
intracellular staining (Fig. 7, column 4). Potassium currents 
obtained with the GFP-tagged subunits were similar to those 
shown in Figs. 3 and 4. The intracellular staining was a nearly 
pure DsRed-ER fluorescence, showing hardly any overlap (Fig. 
7, column 5). These results indicate that Kv2.1 promotes traf- 
ficking of Kv6.3, KvlO.l, and Kvll.l to the cell surface mem- 
brane, presumably by forming heterotetrameric channels with 
these subunits. 

Discussion 

This study reports the cloning and characterization of three 
previously uncharacterized CK-subunits of voltage-gated potas- 
sium channels: Kv6.3, KvlO.l, and Kvll.l. The conventional 
methods to clone potassium channels include homology and 
expression cloning (14, 23). The disadvantage of both techniques 
is their dependence on expression level or on a functional 
signature: genes with very low expression levels or lacking a 
functional signature are not (easily) picked up by these tech- 
niques. The human genome project does allow to detect and 
clone such genes, as is demonstrated here for Kv6.3, KvlO.l, and 
Kvll.l. 

When expressed in mammalian Ltk" cells each of the three 
subunits was unable to elicit any current, indicating that they 
belong to the "silent" subunits (6-10). The lack of functional 
current can be explained by retention in the ER, as was 
demonstrated with confocal microscopy, comparable with ob- 
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Fig. 7. Subcellular localization of the channel-GFP fusion proteins assessed 
by confocal imaging. The rows show images with GFP fusion proteins of Kv6.3, 
KvlO.l. and Kvll.l, respectively. The first three columns show the fluores- 
cence of the channel subunits, the DsRed-ER localization vector, and the 
overlay of both, respectively. The last two columns show cells cotransfected 
with Kv2.1, DsRed-ER. and each of the subunits; the surface staining of the 
GFP-tagged subunits (fourth column) is obvious with minimal overlap with the 
DsRed-ER fluorescence (overlay of both in the fifth column). (Scale bar. 1 0 /xm.) 



servations for Kv8 and Kv9 subunits (9, 16, 24). Such retention 
can have various causes such as ER retention signals or improper 
folding and/or assembly. Investigation of the sequences of 
Kv6.3, KvlO.l, and Kvll.l did not reveal known ER retention or 
export signals, suggesting an assembly problem. For the confocal 
imaging we used a C-terminal GFP tag, which could interfere 
with trafficking. Indeed, C-terminal sequences can control effi- 
cient cell surface expression and clustering (25, 26). However, 
the three subunits reported here do not display such sequences 
and the GFP tag did not effect the currents recorded after 
coexpression. Inefficient assembly of channel subunits might 
originate from the aminoterminal "NAB" domain that directs 
and restricts subunit assembly within Kv subfamilies (18-21). 
Indeed, the aminotermini of Kv6.3, KvlO.l, and Kvll.l did not 
interact with themselves, as was demonstrated with a yeast 
two-hybrid analysis. Therefore, to the extent that the NAB 
domain facilitates homotetrameric assembly, these subunits 
would appear incapable of efficient homotetramerization, which 
might explain ER retention. . 

However, these incompatible amino termini may not be the 
only reason for the lack of functionality for these and other silent 
Kv channels. Indeed, distinct currents were observed for a 
chimera between the N terminus of Kv8.1 in a Kvl.3 background 
(7). However, a chimera with S6 from KvS.l in a Kvl.3 back- 
ground (and vice versa) was not functional, which indicates that 
part of the nonfunctionality resides in the S6 segment. The 
alignment of this segment (Fig. 8) demonstrates that the three 
subunits reported here, as well as previously cloned silent 
subunits, all lack the second proline of the conserved P-X-P 
motif of the Kvl-Kv4 subunits. This points to a major structural 
difference in the S6 segments between the functional and silent 
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Fig. 8. Alignment of the S6 segment of the Kv potassium channels. One 
member of each subfamily is represented. Conserved amino acids are shaded 
in gray. Sequence numbering of Kv2.1 is shown on top. 
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subunits. However, when P406 and V409 of Kv2.1 were mutated 
to the corresponding residues of Kv8.1, altered but functional 
currents were observed, indicating that these residues alone do 
not explain the nonfunctional S6 chimera (24). However, P406 
is the second proline in the highly conserved P-X-P motif from 
the functional Kv channels and might be responsible for a sharp 
bend in the S6 helical structure involved in gating (27). All of the 
silent subunits lack the second proline of the P-X-P motif, 
indicating a structural difference of the S6 segment between the 
functional and the silent subunits, which is apparently compen- 
sated in the heterotetrameric configuration. 

The profound effects of Kv6.3 on Kv2.1 gating properties 
suggest an important role for these heterotetramers: the latter 
would be inactivated at potentials close to resting potential (Kvi 
for inactivation is —56 mV) in contrast to the homotetrameric 
K.V2.1 channels (V'a = -16 mV). Because both subunits are 
expressed in the brain (Fig. 2) functional heterotetramers could 
exist (6, 17). Previous studies on the sustained delayed rectifier 
component of hippocampal neurons showed properties that are 
comparable with those of Kv2.1 and Kv6.3 heteromultimers (28, 
29). At -5 mV the two time constants for activation for the 
current in those neurons were 53 ms and 190 ms, which is 
comparable with heterotetrameric channels of Kv2.1 and Kv6.3 
(Table 1). In addition, the midpoint of inactivation was more 
negative (-96 mV), which is at least closer to —56 mV for Kv2.1 
and Kv6.3 compared with —16 mV for Kv2.1 alone. Further- 
more, the pharmacological profile for homomeric Kv2.1 chan- 
nels did not correspond completely with that of the sustained 
delayed rectifier component: the TEA sensitivity depended on 
the cell type under investigation (29) and differed from Kv2.1. 
Thus native channels that are considered to contain Kv2 subunits 
may well be heterotetramers, although it will be a challenge to 
assign the proper heterotetrameric combination. 

In neurons, Kv2.1 is thought to have a role in controlling the 
membrane potential and in the electrical signaling of cells (30, 
31). Using antisense oligonucleotides it was demonstrated that 
somato-dendritic excitability was regulated by Kv2.1 in hip- 
pocampal neurons (32). The down-regulation of the Kv2.1 
protein (>90%) was associated with action potential broadening 
and an increase in intracellular calcium at high-frequency stim- 
ulation. The gating properties were not reported in this study but 
the 90% down regulation of the Kv2,l protein was associated 
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with only a 50% reduction of the sustained delayed rectifier 
component. Although the molecular nature of the sustained 
current remains to be elucidated, a heterotetrameric subunit 
composition could be compatible with such dominant negative 
results. 

Within the Kvl, Kv3, and Kv4 families, functional diversity is 
achieved by the different properties of each subunit, and by the 
heteromeric (intrafamily) assembly of a-sub units resulting in 
channels with distinct biophysical properties. The Kv2 subfamily 
contains only two members that have very similar biophysical 
properties. While Kv2.1 and Kv2.2 are capable of heteromul- 
timerization, the resulting currents are functionally similar to 
those of their homotetramers (33). It has been suggested that the 
functional diversity within this family is achieved through het- 
eromeric assembly with other subfamilies of silent subunits (34). 
Our results now add three more subunits that can expand further 
the functional diversity in different types of tissue or during 
development. 

Thus far, 19 functional Kv a-subunits had been discovered and 
only 7 silent subunits. Our results enlarge this last group to 10 
subunits. Despite the large number of these subunits, their exact 
physiological role is still poorly understood mainly because of the 
difficulty in recognizing a silent subunit in isolated cells or in 
tissue. Thus far, heterologous expression studies have led to the 
hypothesis that the silent subunits must interact with other Kv 
subunits from the Kv2 and Kv3 subfamilies to regulate their 
function. If each of the silent subunits can interact with the two 
members of the Kv2 subfamily and the four members of the Kv3 
subfamily, then at least 60 different heterotetramers are possible 
(each with one to three silent subunits). Thus, this growing group 
of silent subunits considerably expands the potential for molec- 
ular diversity of the native K"*" channels. Thus, future experi- 
ments will be necessary to reveal the true interaction partners 
and the physiological importance of the silent subunits. 

Note Added in Proof. While this paper was under review, another group 
(35) reported cloning of "Kv6.3,** which corresponds to KvlO.l in our 
analysis (see Fig. 1). 

We thank Dr. Jean -Pierre Timmermans for the use of the confocal 
microscope. This work was supported by Flanders Institute for Biotech- 
nology Grant PRJ05 and National Institutes of Health/National Heart, 
Lung, and Blood Institute Grant HL59689. 



19. Xu, J., Yu, W., Jan. Y. N., Jan. L. Y. & Li. M. (1995) J. Biol. Chem. 270, 
24761-24768. 

20. Papazian. D. M. (1999) Neuron 23, 7-10. 

21. Shcn. N. y. & Pfaffmgcr. P. J. (1995) Neuron 14, 625-633. 

22. Covarrubias, M., Wei, A. A. & Salkoff. L (1991) Neuron 7, 763-773. 

23. Tamkun, M. M., Knoth, K. M.. Walbridgc, J. A., Kroemer, H., Roden, D. M. 
& Glover, D. M. (1991) FASEB J. 5, 331-337. 

24. Salinas. M., dc Weille, J., Guillemare, E., Lazdunski, M. & Hugnot, J. P. (1997) 
J. Biol. Chem. 272, 8774-8780. 

25. Burke, N. A., Takimoto, K.. Li, D., Han, W., Watkins, S. C. & Levitan, E. S. 
(1999)7- Gen. Physiol 113, 71-80. 

26. Li, D., Takimoto, K. & Lcvitan. E. S. (2000)7. Biol. Chem. 275, 11597-11602. 

27. del Camino. D., Holmgren, M., Liu, Y. & Yellen, G. (2000) Nature (London) 
403, 321-325. 

28. Numann, R. E., Wadman. W. J. & Wong, R. K. (1937)7. Physiol. (London) 393, 
331-353. 

29. Zhang, L. & McBain, C. J. (1995) 7, Physiol. (London) 488, 647-660. 

30. Murakoshi, H. & Trimmer, J. S. (1999) 7. Neurosci. 19, 1728-1735. 

31. Baranauskas, G., Tkatch, T. & Surmclcr, D. J. (1999) 7. Neurosci. 19, 
6394-6404. 

32. Du. J., Haak, L. L., Phillips, T. E„ Russell, J. T. & McBain, C. J. (2000) 
7. Physiol. (London) 522, 19-31. 

33. Blaine. J. T. & Ribcra, A. B. (1998)7. Neurosci. 18, 9585-9593. 

34. Kramer, J. W., Post, M. A., Brown, A. M. & Kirsch, G. E, (1998) /4m. 7. Physiol. 
274, C1501-C1510. 

35. Sano, Y., Mochizuki. S., Miyakc, A., Kitada, C, Inamura, K., Yokoi. H., 
Nozawa, K., Matsushimc, H. & Furuichi, K. (2002) FEBS Lett. 512, 230-234. 



Ottschytsch et at. 



PNAS [ June 11.2002 j vol.99 | no. 12 | 7991 



Entrez PubMed 



EXHIBIT F 



Page 1 of 3 




Entrez 



PubMed 



Pub^fSjed 



National j 
Library i 
of Medicine 





My NCBI 


1 NLM ■ 


[Sign In] [Registerl 



Nucleotide 



Prolein 



Genome 



Structure 



OMIM 



PMC 



Journals 



Books 



Search | PubMed 



S for[ 



|G6 peieai:;! 



Limits Preview/Index History Clipboard Details 



About Entrez 
Text Version 

Entrez PubMed 
Overview 
Help I FAQ 

Tutorial 

Nevv/Noleworthy 
E-Utilities 

PubMed Services 
Journals Database 
MeSH Database 
Single Citation Matcher 
Batch Citation Matcher 
Clinical Queries 
LinkOut 

My NCBI (Cubby) 

Related Resources 
Order Documents 
NLM Catalog 
NLM Gateway 
TOXNET 
Consumer Health 
Clinical Alerts 
ClinicalTrials,gov 

PubMed Central 



[./Display^: 



Citation 



jl Show: l2o"B [siprt iij ^iiifi;^ 



All: 1 Review: 0 



D 1: Brain Res Mol_Brain Res. 2004 Apr 7;123(1-2):91-103. 




ULL^TEXTARTICUS 



Related Articles. Links 



Modification of Kv2.1 K+ currents by the silent KvlO subunits. 

Vega-Saenz de Miera EC, 

Department of Physiology and Neuroscience, New York University School of 
Medicine, 550 First Avenue, New York, NY 10016, USA. 
vegaeOl ©endeavor.med.nyu.edu 

Human and rat KvlO. la and b cDNAs encode silent Kh- channel pore-forming 
subunits that modify the electrophysiological properties of Kv2.1. These 
alternatively spliced variants arise by the usage of an alternative site of splicing 
in exon 1 producing an 1 1 -amino acid insertion in the linker between the first 
and second transmembrane domains in KvlO. lb. In human, the KvlOs mRNA 
were detected by Northern blot in brain kidney lung and pancreas. In brain, they 
were expressed in cortex, hippocampus, caudate, putamen, amygdala and 
weakly in substantia nigra. In rat, KvlO.l products were detected in brain and 
weakly in testes. In situ hybridization in rat brain shows that KvlO.l mRNAs 
are expressed in cortex, olfactory cortical structures, basal ganglia/striatal 
structures, hippocampus and in many nuclei of the amygdala complex. The CAS 
and dentate gyrus of the hippocampus present a gradient that show a 
progression from high level of expression in the caudo-ventro-medial area to a 
weak level in the dorso-rostral area. The CAl and CA2 areas had low levels 
throughout the hippocampus. Several small nuclei were also labeled in the 
thalamus, hypothalamus, pons, midbrain, and medulla oblongata. Co-injection 
of Kv2.1 and KvlO. la or b mRNAs in Xenopus oocytes produced smaller 
currents that in the Kv2.1 injected oocytes and a moderate reduction of the 
inactivation rate without any appreciable change in recovery from inactivation 
or voltage dependence of activation or inactivation. At higher concentration, 
KvlO. la also reduces the activation rate and a more important reduction in the 
inactivation rate. The gene that encodes for KvlO.l mRNAs maps to 
chromosome 2p22.1 in human, 6ql2 in rat and 17E4 in mouse, locations 
consistent with the known systeny for human, rat and mouse chromosomes. 
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Gene transfer and metabolic modulators as new therapies for 
pulmonary hypertension. Increasing expression and activity of 
potassium channels in rat and human models. 

Michelakis ED, Dyck JR, McMurtry MS, Wang S, Wu XC, Moudgil R, 
Hashimoto K, Puttagunta L, Archer SL. 

Department of Medicine (Cardiology), University of Alberta, Edmonton, 
Canada. 

Chronic Hypoxic Pulmonary Hypertension (CH-PHT) is characterized by 
pulmonary artery (PA) vasoconstriction and cell proliferation/hypertrophy. PA 
smooth muscle cell (PAiSMC) contractility and proliferation are controlled by 
cytosolic Ca4-+ levels, which are largely determined by membrane potential (E 
(M)). E(M) is depolarized in CH-PHT due to decreased expression and 
functional inhibition of several redox -regulated, 4-aminopyridine (4-AP) 
sensitive, voltage-gated K+ channels (Kvl.5 and Kv2.1). Humans with 
Pulmonary Arterial Hypertension (PAH) also have decreased PASMC 
expression of Kvl.5 and Kv2.1. We speculate this *'K+-channelopathy" 
contributes to PASMC depolarization and Ca++ overload thus promoting 
vasoconstriction and PASMC proliferation. We hypothesized that restoration of 
Kv channel expression in PHT and might eventually be beneficial. METHODS: 
Two strategies were used to increase Kv channel expression in PASMCs: oral 
administration of a metabolic modulator drug (Dichloroacetate, DCA) and 
direct Kv gene transfer using an adenovirus (Ad5-Kv2.1). DCA a pyruvate 
dehydrogenase kinase inhibitor, promotes a more oxidized redox state 
mimicking normoxia and previously has been noted to increase K+ current in 
myocytes. Rats were given DCA in the drinking water after the development of 
CH-PHT and hemodynamics were measured approximately 5 days later. We 
also tested the ability of Ad5-Kv2.1 to increase Kv2.1 channel expression and 
function in human PAs ex vivo. RESULTS: The DCA-treated rats had 
decreased PVR, RVH and PA remodeling compared to the control CH-PHT rats 
(n=5/group, p<0.05), DCA restored Kv2.1 expression and PASMC Kv current 
density to near normoxic levels. Adenoviral gene transfer increased expression 
of Kv2.1 channels and enhanced 4-AP constriction in human PAs. 
CONCLUSION: Increasing Kv channel function in PAs is feasible and might 
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EXHIBIT H 

Downregulation of voltage-gated channels 
in rat heart with right ventricular hypertrophy 
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Lee, Jong-Kook, Atsushi Nishiyama, Fukushi Kambe, 
Hisao Seo, Susumu Takeuchi, Kaichiro Kamiya, Itsuo 
Kodama, and Junji Toyama. Downregulation of voltage- 
gated K-^ channels in rat heart with right ventricular hyper- 
trophy. Am. J. Physiol. 277 {Heart Circ. Physiol. 46): H1725- 
HI731, 1999. — The effects of myocardial hypertrophy on 
mRNA expression levels of voltage-gated K+ channels were 
investigated using monocrotaline (MCT) -induced pulmonary 
hypertensive rats. The ratio of right ventricle weight to left 
ventricle plus septum weight on day 28 wteiS increased signifi- 
cantly compared with control rats [control vs. MCT: 0.27 ± 
0.01 vs. 0.58 ± 0.03 ms {n = 8-13); P< 0.05]. Electrocardio- 
grams showed that QRS duration [control vs. MCT: 26.4 ± 2.6 
ms vs. 31.5 ± 5,8 ms (n = 6); P< 0.05], Q-T interval [control 
vs. MCT: 100.8 ± 8.9 ms vs. 110.0 ± 4.2 ms (n = 6); P< 0.05] 
and corrected Q-T interval [Q-Tc: control vs. MCT: 8.4 ± 0.7 
ms vs. 10.2 ± 0.4 ms {n = 6): P < 0.05] were prolonged 
significantly on day 28. mRNA levels of Kvl.2, 1.5, 2.1, 4.2. 
and 4.3 for day 28 assGsse6 by ribonuclease protection assays 
were decreased significantly from control by 60 ± 10, 76 ± 3, 
58 ± 5. 81 ± 5. and 45 ± 12%, respectively {n = 3; P< 0.005), 
and Kvl.4 mRNA level for day 28 was unaffected [Kvl.4, 
control vs. MCT: 1.0 ± 0,28 vs. 0.88 ± 0.44 (arbitrary units) 
{n = 3); not significant (NS)]. On the other hand, there was no 
significant difference between control and MCT rats in mRNA 
levels of these Kv channels for day 14 [Kvl.2 (control vs. 
MCT): 1.0 ±0.25 vs. 0.87 ± 0.18 (/?= 3), NS; Kvl.4: 1.0 ±0.22 
vs. 1.27 ± 0.37 {n = 3). NS; Kvl.5: 1.0 ± 0.16 vs. 0.91 ± 0.28 
{n = 3), NS; Kv2.1: 1.0 ± 0.26 vs. 0.99 ± 0.25 (/? = 3). NS: 
Kv4.2: 1.0 ± 0.15 vs. 1.22 ± 0.28 (/?= 3),NS; Kv4.3: 1.0 ±0.20 
vs. 1.21 ± 0.28 (n = 3). NS]. These findings suggest that 
altered ventricular repolarization at the advanced stage of 
hypertrophy may be the result of an inhibition of gene 
expression of multiple types of voltage-gated K+ channels. 

ventricular hypertrophy; voltage-gated potassium channels; 
messenger ribonucleic acid expression 



CLINICAL STUDIES have suggested that ventricular hyper- 
trophy is associated with a greater risk of sudden 
cardiac death probably caused by lethal ventricular 
arrhythrnias (18). Alterations of repolarization are of- 
ten recognized in clinical electrocardiograms (ECGs) 
with the development of ventricular hypertrophy. Cellu- 
lar electrophysiological studies have shown that these 
alterations in repolarization are caused by the prolonga- 
tion of action potential duration (APD) (1), APD prolon- 
gation has been ascribed to a decrease of the transient 
outward K+ current (/to) density or an increase of the 
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L-type Ca2+ current (/ca) density in a variety of experi- 
mental models of cardiac hypertrophy in rats (5. 7, 15, 
27. 31). cats (12. 16, 24). and guinea pigs (26). Compa- 
rable changes in 7^ and /ca were also reported in human 
patients with an advanced stage of congestive heart 
failure (6). 

In a recent study using rats with monocrotaline 
(MCT) -induced right ventricular (RV) hypertrophy, we 
reported (17) that hypertrophy was associated with 
stage-dependent changes in /to and Ic^: the APD prolon- 
gation in the early compensated stage of hypertrophy 
may be caused mainly by an increase of /ca density, 
whereas the APD prolongation in the advanced stage of 
hypertrophy may be the result of a reduction of I^^ 
density. The decrease of /to density at the late stage of 
hypertrophy is consistent with previous reports on 
other models of ventricular hypertrophy (5. 7, 31). In 
adult rat hearts, many voltage-gated channel sub- 
units have been cloned, which include Kvl,2. Kvl.4, 
KvL5, Kv2.1, Kv4.2. and Kv4.3. Kv4.2 and Kv4.3 of the 
Shal family are the most likely candidates for /^o (3.9), 
whereas Kvl.2 and Kvl.5 of the Shaker family and 
Kv2.1 of the Shah family are considered as candidates 
for other delayed rectifier K+ channels sensitive to 
4-aminopyridine (4-AP) or tetraethylammonium (TEA) 
(4, 8). It has been shown in several studies that the 
expression of these cloned K+ channels is affected in 
certain pathophysiological conditions including cardiac 
hypertrophy and hormonal abnormalities (20. 23. 29. 
30). The molecular mechanisms for altered repolariza- 
tion in cardiac hypertrophy are, however, still unsettled 
and controversial (20, 30). 

In the present study, we investigated changes in 
voltage-gated K+ channel gene expression in hypertro- 
phied rat hearts with MCT-induced pulmonary hyper- 
tension. mRNA levels of Kvl.2. KvL4. Kvl.5, Kv2.1. 
Kv4.2. and Kv4.3 a-subunits were measured by ribo- 
nuclease protection assay (RPA). The results have 
revealed that not only Kv4.2 and Kv4.3 but also Kvl.2. 
Kvl.5, and Kv2.1 mRNA are downregulated in the 
hypertrophied ventricle. Such alteration in multiple 
types of voltage-gated K+ channel gene expression may 
contribute to the repolarization delay in an advanced 
stage of ventricular hypertrophy. 

MATERIALS AND METHODS 

Animals. Five-week-old male Wistar rats weighing 170- 
190 g were treated with MCT (Sigma. St. Louis, MO) to 
produce pulmonary hypertension as described previously (14, 
17. 22). In brief, a single dose of 60 mg/kg MCT. which was 
dissolved in 1 N HCl neutralized with 0.5 N NaOH and 
diluted with sterile distilled water to obtain a 2% solution. 
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was injected subcutaneously into the interscapular region. In 
control rats of corresponding age and weight, saline was 
injected instead of MCT. The rats were allowed to eat freely 
from a supply of standard rat chow. The animals were killed 
under ether anesthesia on the day of MCT or saline injection 
(dayO) or 7, 14, 21, and 28 days after the injection. The hearts 
were removed quickly and used for estimation of RV hypertro- 
phy as well as for cell isolation and mRN A measurements. RV 
hypertrophy was estimated by measuring the ratio of the RV 
free wall tissue weight to body weight (BW) and that of 
RV weight to left ventricular free wall plus septum (LV+S) 
weight. 

Electrocardiograms. ECGs were recorded immediately be- 
fore and on days 14 and 28 after the injection of MCT. Under 
anesthesia (20 mg/kg pentobarbital sodium ip), leads I and II 
were recorded. The signals were stored with a digital audio 
recording system (Sony. Tokyo, Japan), and the ECG param- 
eters were analyzed using software (Softron, Tokyo, Japan) 
programmed for the analysis of ECG parameters in rodents. 

Ribonuclease protection assay. For the RPA. rat Kvl.2. 
Kvl.4. Kvl.5, Kv2.1, Kv4.2, and Kv4.3 a-subunit cDNA 
fragments were amplified by RT-PCR. The nucleotide se- 
quences of the primers and the amplified regions are de- 
scribed here. Nucleotide numbers for each primer correspond 
to those from the translation start site: Kvl.2: sense 5'- 
AAGCTTT AACTGATGTCTGATTGAAACCTA-3'. antisense 
5'-GATGCTGGCTCCATGGGTGAC-3', nucleotides 1,487- 
1,743 (21); Kvl.4: sense 5'-AAGCTTTCTACTTGTTCTT- 
CCCTGGGGGAC-3', antisense 5'-TGCATCACTTATTTG- 
ATATGC-3', nucleotides 1.801-2,132 (32); Kvl.5: sense 
5'-CCGAGTATTTAAGCCCACCTG-3', antisense 5'-CTAA- 
GCmTTAAAGTCAAATTTG-3', nucleotides 1.888-2.144 
(28); Kv2.1: sense 5 ^- AAGCTT GCTCTGGTTTCTTCGTGG A- 
GAGTC-3', antisense 5'-CACGCTGTAGAGCAGCTGACC-3'. 
nucleotides 1.931-2.295 (1 1); Kv4.2: sense 5'-TACCGCACGG 
GGAAGCTTCACTAT-3', antisense 5'-TGGAACTGTTTCC- 
ACCACATTCGC-3'. nucleotides 295-624 (2); Kv4.3: sense 
5'-AAGCTTGGCACCCCAGAAGAGGAGCATG-3\ antisense 
5'-GTTGGAGTTGGGCAGGTGCGTGGT-3', nucleoUdes 1,372- 
1 .626 (10). A Hind III site (AAGCTT) was introduced into the 
5' end of the sense primers of the Kvl.2, Kvl.4, Kv2.1. and 
Kv4.3 (underlined). In Kv4.2, a Hind III site is present in the 
coding region (underlined). The amplified cDNAwas cloned 
into pGEM-T vector using the TA cloning system (Promega, 
Madison, WI). 

The plasmids containing cDNAs were linearized by diges- 
tion with an appropriate restriction enzyme {Hind III for 
Kvl.2. Kvl.4, Kv2.1, Kv4.2, and Kv4.3; TVcoI present in 
pGEM-T vector for Kvl.5). Antisense cRNA probes were 
prepared using a MAXIscript kit (Ambion, Austin. TX) and 
[a-32p]UTP (Du Pont-New England Nuclear). The cyclophilin 



cRNA probe was also prepared from the cDNA purchased 
from Ambion (pTRI-cyclophilin-rat antisense control tem- 
plate, nucleotides 38-142) to detect cyclophilin mRNA as an 
internal control. RPA was performed using a HybSpeed RPA 
kit (Ambion) according to the manufacturer s protocol. Hybrid- 
ization of the two probes [2 X 10** counts/min (cpm) Kv4.2 
cRNA and 2 x 10^ cpm cyclophilin cRNA] with 10 pg total 
RNAwas carried out at eS^C for 10 min, followed by digestion 
with RNase A and RNase Tl at 37^ for 30 min. The reaction 
was terminated by addition of sodium dodecyl sulfate and 
proteinase K. followed by phenol-chloroform extraction and 
ethanol precipitation. The protected fragments were visual- 
ized by autoradiography after electrophoresis on a 5% poly- 
acrylamide/8 M urea gel. Quantitative analysis was carried 
out using Fujix Bio image Analyzer with which we measured 
the radioactivity of the bzmds in a selected area. Each mRNA 
level of Kv channels was normalized by the levels of cy- 
clophilin. The mRNA of each lane in the gels is from different 
animals. 

Statistics, Data are expressed as means ± SE. Statistical 
analyses were performed using one-way analysis of variance 
with multiple comparisons. Differences were considered sig- 
nificant at P< 0.05. 

RESULTS 

Characteristics of experimental animals. Table 1 sum- 
marizes BW and heart, lung, liver, and kidney weight 
before and after the injection of MCT. There was no 
significant difference in BW between groups on days 0 
and 14, but BW of MCT- treated rats were significantly 
decreased by 19.3% compared with those of control rats 
on day 28. The ratio of RV weight to BW and the ratio of 
RV weight to LV+S weight were both increased signifi- 
cantly on days 14 and 28, whereas the ratio of LV+S 
weight to BW was unaffected during the entire observa- 
tion period. There was no significant difference in the 
ratio of kidney and liver weights to BW. On the other 
hand, there was a significant increase in the ratio of 
lung weight to BW in the MCT rats, probably because of 
the primary pathological effects of MCT on the lung. 
Eleven of twelve MCT rats showed the signs of right- 
sided heart failure during the following week, including 
tachypnea, ascites, pleural effusion, edematous extremi- 
ties, and piloerection, and ten of twelve MCT rats died 
by day 35. 

Electrocardiograms. ECG leads I and II were re- 
corded every week after MCT injection. Figure 1 shows 
the representative tracings of ECG lead II recorded in 



Table 1 . Body and organ weights in control and MCT-treated rats 





n 


BW.g 


RV/BW. 
XI 0-3 


(LV + S)/BW. 
xiO-3 


RV/(LV-+-S) 


Lung/BW. 
XlO-3 


Liver/BW, 
xiO-3 


Kidney/BW. 

xio-3 


DayO 




















Control 


6 


144±5.1 


0.75 ±0.04 


2.88 ± 0.06 


0.26 ± 0.02 


7.00 ±0.17 


41.5± 


4.23 


9.70 ±0.36 


MCT 


6 


145 ±6.2 


0.77 ±0.06 


2.89 ±0.06 


0.27 ±0,03 


7.00 ±0.55 


42.9 ± 


2.22 


9.50 ±0.28 


Day 14 




















Control 


5 


243 ±4.2 


0.60 ±0.02 


2.39 ±0.1 2 


0.26 ±0,02 


6.42 ±0.36 


44.8± 


2.28 


8.93 ±0.18 


MCT 


5 


233 ±7.1 


0.89 ±0.06* 


2.54 ±0.11 


0.35 ±0.02* 


8.26 ±0.07* 


46.5 ± 


0.81 


9.23 ±0.57 


Day 28 




















Control 


8 


319 ±8.0 


0.64 ±0.02 


2.36 ±0.12 


0.27 ±0.01 


5.80 ±0.57 


42.3 ± 


1.50 


7.60 ±0.35 


MCT 


13 


251 ±5.1* 


1.45 ±0.06* 


2.51 ±0.06 


0.58 ±0.03* 


10.0 ±0.55* 


40.9 ± 


2.00 


7.60±0.18 



Values are means ± SE. MCT, monocrotaline; BW. body weight; RV, right ventricle; LV + S. left ventricle + septum. * Significantly different 
from control at P< 0.05. 
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Fig, 1. Body surface electrocardiograms (EGG) of ex- 
tremity lead (II) before monocrotaline (MCT) and saline 
injection and on days 14 and 28 after injection. Top 
traces are records from a control rat. and bottom traces 
are from a MCT rat. 



1 mV 



100 msec 



the same rats immediately before injection {day 0) and 
on days 14 and ^5 after the injection. In the ECGs of the 
MCT rat, the T wave was flattened and the Q-T interval 
was prolonged on day 14 and the prolongation was 
remarkable on day 28, whereas the ECGs of the control 
rat remained unchanged over the entire observation 
period. Table 2 summarizes the ECG data obtained 
from control and MCT-treated rats on days ft 14, and 28 
after injection. The Q T interval was significantly pro- 
longed, on average, by 9.5 and 10.2% on days 14 send 28, 
respectively, and the interval corrected by the heart 
rate (Q-TJ was also significantly prolonged by 6.9 and 
13.7% on days 14 and 28. Although QRS duration did 
not show any difference on day 14, it was prolonged by 
16.2% on day 28. P-R interval was unchanged over the 
observation period. 

Gene expression of Kv channels. To examine the 
effects of cardiac hypertrophy on mRNA expression of 
cloned K+ channels, mRNA levels were measured with 
RPA, using the hearts obtained from control and MCT- 
treated rats killed on day 28, mRNA levels of three 
Shaker (Kvl.2, 1.4. 1.5). one Shah (Kv2.1), and two 

Table 2. ECG parameters recorded from control 
and MCT-treated rats 







QRS Time. 


P R Time. 


Q-T Interval. 






n 


ms 


ms 


ms 


Q-T, 


Day a 












Control 


5 


24.0 ±2.4 


41.3±0.5 


96.6 ± 3.8 


8.2 ±0.2 


MCT 


5 


24.2 ±1.8 


39.5 ±9.1 


96.2 ±5.8 


8.3 ±0.3 


Day 1 4 












Control 


6 


24.0 ±2.4 


39.0 ±4.0 


100.8 ±3.6 


8.7 ±0.1 


MCT 


5 


24.6 ±1.8 


43.0 ±4.0 


110.4 ±1.9* 


9.3±0.1* 


Day 28 












Control 


6 


26.4 ±2.6 


41.5±4.9 


98.8 ±4.9 


8.8 ±0.5 


MCT 


5 


31.5±5,8* 


39.0 ±7.7 


110.0±4.2* 


10.2 ±0.4* 



Values are means ± SE. Q-Tc, corrected Q-T interval. * Signifi- 
cantly different from control at P < 0.05, 



Shal (Kv4.2 and Kv4.3) channels were examined. These 
mRNAs were readily detected. Cyclophilin mRNA ex- 
pression levels were used for the internal control. 

Figure 2 shows the results for the Shaker family 
channels. The expression levels of Kvl.2 and Kvl.5 
channels normalized to cyclophilin expression levels 
were significantly lower in the MCT-treated rats than 
in control rats [Kvl.2 (control vs. MCT): 1.0 ± 0.12 vs. 
0.4 ± 0.13 (arbitrary units) {n = 3), P< 0.05; Kvl.5: 
1.0 ± 0.05 vs. 0.24 ± 0.03 {n = 3), P < 0.01] (Fig. 2). 
Unlike Kvl.2 and Kvl.5, the expression levels of Kvl.4 
mRNA did not show a significant difference between 
two groups [Kvl.4 (control vs. MCT): 1.0 ± 0.28 vs. 
0.88 ± 0.44 (n = 3); not significant (NS)] (Fig. 2). The 
expression levels of Kv2. 1 mRNA channels were signifi- 
cantly decreased in the MCT-treated rats compared 
with control rats [Kv2.1 (control vs. MCT): 1.0 ± 0,02 
vs. 0.42 ± 0.05 in = 3); P < 0.05] (Fig. 3). Figure 4 
shows the expression levels of the Shal family chan- 
nels. The expression levels of Kv4.2 were markedly 
decreased in the MCT rats [Kv4.2 (control vs. MCT): 
1.0 ± 0.08 vs. 0.19 ± 0.05 (/? = 3); P < 0.01]. In the 
meantime, Kv4.3 mRNA expression levels were also 
significantly decreased, but the extent of decrease was 
moderate compared with that of Kv4.2 [Kv4,3 (control 
vs. MCT): 1.0 ± 0.05 vs. 0.55 ± 0.12 (/? = 3); P< 0.05]. 

mRNA levels of Kv channels on day 14 after injection 
were also measured to examine the effects of cardiac 
hypertrophy at the early stage. There was a slight 
increase in Kvl.4, Kv4.2, and Kv4.3 [Kvl.4 (control vs. 
MCT): 1.0 ± 0.22 vs. 1.27 ± 0.37 (/? = 3), NS; Kv4.2: 
1.0 ± 0.15 vs. 1.22 ± 0.28 (n = 3), NS; Kv4.3: 1.0 ± 0.20 
vs. 1.21 ± 0.28 (n = 3), NS] and a slight decrease in 
Kvl.2 and Kvl.5 [Kvl.2 (control vs. MCI^: 1.0 ± 0.25 
vs. 0.87 ± 0.18 (n = 3). NS; Kvl.5: 1.0 ± 0.16 vs. 0.91 ± 
0.28 (n = 3). NS]. but these differences did not reach 
statistical significance. The mRNA level of Kv2.1 was 
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Fig. 2. Kvl channel mRNA expression In control and hypcrtrophied right ventricles. Hypertrophied right ventricles 
were obtained from adult rats with pulmonary hypertension induced by a single injection of MCT. Total RNAs were 
extracted from right ventricles for ribonuclease protection assay on day 28 aher MCT injection. A, C. and E: mRNA 
expression of Kvl. 2, Kvl.4, and Kvl.5. respectively. Cyclo. cyclophilin mRNA as internal control: Cyclo*. mRNA of 
cyclophilin with less exposure. B, D, and F: average amounts of Kvl.2 (B). Kvl.4 {D) and Kvl.5 {F) mRNA are 
presented as ratio to internal control {n = 3). Significantly different from control: *P< 0.05, ** P < 0.01. NS not 
significant. 



MCT 



identical between groups [1.0 ± 0.26 vs. 0.99 ± 0.25 
(/? = 3),NS]. 

DISCUSSION 

MCT caused hypertrophy I'n RV In the present study, 
we investigated the underlying molecular mechanisms 
of the altered repolarization in ventricular hypertro- 
phy, using rats with RV hypertrophy secondary to 
MCT-induced pulmonary hypertension. MCT is known 
to cause pulmonary hypertension in rats through endo- 
thelial cell damage, medial thickening of the muscular 
pulmonary arteries, and neomuscularization of nonmus- 
cular distal arteries. A single injection of MCT caused 
macroscopic RV hypertrophy without any morphologi- 
cal changes in the LV. An increase of the ratios of RV 
and lung weights to BW was observed in the MCT- 
treated rats, but the ratios of kidney and liver weights 



to BW were not affected by the treatment. Recent 
experimental studies have indicated that an increase of 
endogenous endothelin-1. a potent endothelium-de- 
rived vasoconstrictor peptide, is involved in the patho- 
genesis of MCT-induced pulmonary hypertension (22, 
25). However, the lack of morphological change in the 
LV may suggest that the hypertrophy may not be the 
result of direct action of this compound on the heart but 
the result of pressure overload caused by pulmonary 
hypertension. The MCT rats on day 14 are considered 
to be in a compensated state of hypertrophy, because 
they showed normal growth and no physical signs of 
right-sided heart failure. On the other hand, the MCT 
rats on day 28 had more of the properties of heart 
failure, because they showed a significant decrease in 
BW and physical signs of right-sided heart failure, 
including tachypnea, ascites, pleural effusion, edema- 



K+ CHANNEL GENE EXPRESSION IN MYOCARDIAL HYPERTROPHY 



H1729 



Control 



Kv1.5 probe 



Cycio probe 




^<3 Kvl.5 



Cyclo 



<1 Cyclo# 



by other investigators on the LV hypertrophy induced 
in rats by aortic banding and renovascular hyperten- 
sion (4. 19. 32). 

Molecular mechanisms of altered repolarization in 
ventricular hypertrophy. We have shown in the present 
study that mRNA levels of Kv4.2 and Kv4.3 are de- 
creased in MCT-treated rats at the advanced stage of 
hypertrophy Kv4.2 and Kv4.3 are supposed to be the 
most likely candidates for in adult rat ventricles (3. 
9). The reduction of /to density at the advanced stage of 
RV hypertrophy in MCT-treated rats could be the result 
of downregulation of Kv4.2 and Kv4.3 gene expression. 
In experiments using renovascular hypertensive rats, 
Takimoto et al. (30) demonstrated significant reduction 
of mRNA levels for Kv4.2 and Kv4.3 in the LV in 
association with the progress of hypertrophy but no 
significant changes in mRNA levels for 5/7aAer-related 
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Fig. 2E-F— Continued. 



tous extremities, and piloerection. in the following 
week. 

Electrophysiological alterations in ventricular hyper- 
trophy. In association with the development of hypertro- 
phy body surface electrocardiograms showed prolonga- 
tion of Q-T and Q-Tc intervals and QRS duration, 
whereas P-R intervals were not affected (Table 2). In 
our previous electrophysiological experiments on single 
myocytes isolated from MCT-treated rats, cell mem- 
brane capacitance and APD of RV cells were increased 
progressively from day 14 to day 28, whereas other 
parameters of the action potential (resting membrane 
potential and action potential amplitude) were unaf- 
fected (17). The APD at 90% repolarization of the 
MCT-treated RV cells was increased to 192% of control 
{n — 10) on day 28 after the injection. As to the change 
of ionic currents responsible for the APD prolongation 
at the late stage of MCT-treated rats, we reported a 
significant reduction of /to without any changes in its 
voltage dependence and inactivation kinetics (17). The 
changes in cell membrane capacitance and action poten- 
tial configuration observed in our RV hypertrophy 
model are qualitatively similar to those in the reports 
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Fig. 3. Kv2.1 mRNA levels in right ventricles of control and MCT- 
treated rats on day 28 after injection. A: Kv2.1 mRNA measured by 
ribonuclease protection assay. Cyclo. cyclophilin mRNA as internal 
control; Cyclo*. mRNA of cyclophilin with less exposure. B: average 
amount of Kv2.I mRNA is presented as ratio to internal control 
(n = 3). ** Significantly different from control at P< 0.01. 
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(Kvl.2, Kvl.4. Kvl.5), 5/7ab-related (Kv2.1), and 
KvLQTl channels. 

In our experiments using MCT-treated rats, the 
mRNA levels for Kvl.2, Kvl.5, and Kv2.1 were also 
decreased significantly at the advanced stage of RV 
hypertrophy. Reasons for the discrepancy between our 
data and those reported by Takimoto et al. (30) are 
unclear; it might be related to different procedures used 
to produce hypertrophy. 

The physiological and pathological roles of these 
cloned voltage-gated K+ channels in native cardiac cells 
are still unsettled (4. 8). Heterologous expression of 
Kvl.2, Kvl.5. and Kv2.1 in Xenopus oocytes has been 
shown to cause delayed-rectifier type current (/k) or 
rapidly activating sustained outward currents [Isus^ /ss. 
or /kuf). which are sensitive to 4-AP and TEA. In adult 
rat ventricular cells, the amplitude of these delayed- 
rectifier or sustained type outward currents is much 
less than that of /to. This makes it difficult to detect the 
change of their current density in association with the 
progress of ventricular hypertrophy. Nevertheless, we 
cannot rule out some obligatory roles of the downregu- 
lation of Kvl.2. Kvl.5. and Kv2.1 gene expression in 



the APD prolongation in hypertrophied ventricular 
cells. 

As for the early stage of hypertrophy, there was no 
significant difference in mRNA levels of these Kv 
channels. Thus there is a discrepancy between the 
increase of /to density and the mRNA levels of the 
channels. At present, there is no clear interpretation 
for this discrepancy. There might be unknown subcellu- 
lar factors that affect the protein synthesis or the 
availability of the channels in the pathological condition. 

Limitation of study. Because the presence of mRNA 
does not necessarily mean the presence of the encoded 
proteins, studies measuring the mRNA levels have 
limitations for the understanding of the mechanism of 
the pathophysiological changes. To further elucidate 
the mechanism, studies measuring protein levels, in- 
cluding Western blot analysis and immunohistochemis- 
try. will be required. 

Address for reprint requests and other correspondence: I. Kodama, 
Dept. of Circulation. Research Institute of Environmental Medicine, 
Nagoya Univ.. Furo-cho. Chikusa-ku. Nagoya, 464-8601. Japan (E- 
mail: ikodama@riem.nagoya-u.ac.Jp). 

Received 10 September 1998: accepted in final form 10 June 1999. 



K+ CHANNEL GENE EXPRESSION IN MYOCARDIAL HYPERTROPHY 



H1731 



REFERENCES 

1. Aronson, R. S. Characteristics of action potentials of hypertro- 
phied myocardium from rats with renal hypertension. Circ. Res. 
47:443-454. 1980. 

2. Baldwin. T. J.. M. L, Tsaur. G. A. Lopez. Y. N. Jan, and L. Y. 
Jan. Characterization of a mammalian cDNA for an inactivating 
voltage-sensitive channel. Neuron 7: 47 1 -483. 1991 . 

3. Barry. D. M.. and J. M. Nerbonne. Differential expression of 
voltage-gated K+ channel subunits in adult rat heart. Relation to 
functional K+ channels? Circ. Res. 77: 361 -> 369, 1995. 

4. Barry, D. M., and J. M. Nerbonne. Myocardial potassium 
channels: electrophysiological and molecular diversity. Annu. 
Rev. Physiol. 58: 363-394. 1996. 

5. B^nitah, J.-P.. A. M. Gomezu, P. Bailly, J.-P. Da Ponte, G. 
Berson. C. Delgado, and P. Lorente. Heterogeneity of the 
early outward current in ventricular cells isolated from normal 
and hypertrophied rat heart. J. Physiol (Lond.) 469: 111-138, 
1993. 

6. Beuckelmann, D. J., M. Nabauer, and E. Erdmann. Alter- 
ation of K"*" currents in isolated human ventricular myocytes 
from patients with terminal heart failure. Circ. Res. 73: 379-385. 
1993. 

7. Cerbai, E., M. Barbieri, Q. Li, and A. Mugelli. Ionic basis of 
action potential prolongation of hypertrophied myocytes isolated 
from hearts of spontaneously hypertensive rats of different ages. 
Cardiovasc. Res. 28: 1180-1187. 1994. 

8. Deal. K. K., S. K. England, and M. M. Tamkun. Molecular 
physiology of cardiac potassium channels. Physiol. Rev. 76: 
49-67. 1996. 

9. Dixon, J. E., and D. McKinnon. Quantitative analysis of 
potassium channel mRNA expression In atrial and ventricular 
muscle of rats. Circ. Res. 75: 252-260, 1994. 

10. Dixon. J. E., W. Shi, H.-S. Wang, C. McDonald, H. Yu. R. S. 
Wymore. I. S. Cohen, and D. McKinnon. Role of the Kv4 .3 K+ 
channel in ventricular muscle. A molecular correlate for the 
transient outward current. Circ. Res. 79: 659-668, 1996. 

11. Freeh, G. C, A. M, VanDongen, G. Schuster, A. M. Brown, 
and R. H. Joho. A novel potassium channel with delayed 
rectifier properties isolated from rat brain by expression cloning. 
Afeft/re 340: 642-645. 1989. 

12. Furukawa. T.. R. J. Myerburg. N. Furukawa, S. Kiniura. 
and A. L. Bassett. Metabolic inhibition of /ca.L and /k differs in 
feline left ventricular hypertrophy. Am. J. Physiol. 266 {Heart 
Circ. Physiol. 3S):nn2\-Hn2\, 1994. 

14. Hayashi, Y., J. F. Hussa, and J. Lalich. Cor pulmonale in rats. 
Lab. Invest. 16: 875-881. 1967. 

15. Keung. E. C. Calcium current is increased in isolated adult 
myocytes from hypertrophied rat myocardium. Circ. Res. 64: 
753-763. 1989. 

16. Kleiman. R, B., and S. R. Houser. Calcium currents in normal 
and hypertrophied isolated feline ventricular myocytes. Arn. J, 
Physiol. 255 {Heart Circ. Physiol. 24): H1434-H1442, 1988. 

17. Lee, J. K., L Kodama, H. Honjo, T. Anno, K. Kamiya. and J. 
Toyama. Stage-dependent changes in membrane currents in 
rats with monocrotaline-induced right ventricular hypertrophy. 
Am. J. Physiol. 272 {Heart Circ. Physiol. 41): H2833-H2842, 
1997. 



18. Levy. D., R. J. Garrison. D. D. Savage. W. B. Kannel. and 
W. P. Castelli. Prognostic implications of echocardiography 
determined left ventricular mass in the Framingham heart 
study. N. Engl J. Med. 322: 1561-1566. 1990. 

19. Li, Q., and E. C. Keung. Effects of myocardial hypertrophy on 
transient outward current. Am. J. Physiol 266 {Heart Circ. 
Physiol 35): H1738-H1745. 1994. 

20. Matsubara. H.. J. Suzuki, and M. Inada. Shaker-related 
potassium channel, Kvl.4. mRNA regulation in cultured rat 
heart myocytes and differential expression of Kvl.4 and Kvl.5 
genes in myocardial development and hypertrophy. J. Clin. 
Invest. 92: 1659-1666, 1993. 

21. McKinnon, D, Isolation of a cDNA clone coding for putative 
second potassium channel indicates the existence of a gene 
family. J. Biol Chem. 264: 8230-8236. 1989. 

22. Miyauchi. T.. R, Yorikane. S. Sakai. T. Sakurai. M. Okada, 
M. Nishikibe, M. Yano. L Yamaguchi, Y. Sugita. and K. Goto. 
Contribution of endogenous endothelin-I to the progression of 
cardiopulmonary alterations in rats with monocrotaline-induced 
pulmonary hypertension. Circ. Res. 73: 887-897. 1993. 

23. Nishiyama. A., F. Kambe. K. Kamiya, S. Yamaguchi» Y. 
Murata. H. Seo. and J. Toyama, Effects of thyroid and 
glucocorticoid hormones on Kvl.5 potassium channel gene expres- 
sion in the rat left ventricle, Biochem. Biophys. Res. Commun. 
237:521-526, 1997. 

24. Nuss. H. B., and S. R. Houser. Voltage dependence of contrac- 
tion and calcium current in severely hypertrophied feline ventricu- 
lar myocytes. J. Mol Ceil Cardiol 23: 717-726, 1991. 

25. Okada. M., C. Yamashita. M. Okada. and K. Okada. Role of 
endothelin-1 in beagles with dehydromonocrotaline-induced pul- 
monary hypertension. Circ. Res. 92: 114-119, 1995. 

26. Ryder, K. O.. S. M. Bryant, and G. Hart. Membrane current 
changes in left ventricular myocytes isolated from guinea pigs 
after abdominal aortic coarctation. Cardiovasc. Res. 27: 1278- 
1287. 1993. 

27. Scamps. F., E. Mayoux. D. Charlemagne, and G. Vassort. 
Calcium current in single cells isolated from normal and hypert- 
rophied rat heart. Effects of p -adrenergic stimulation. Circ. Res. 
67: 199-208. 1990. 

28. Swanson. R., J. Marshall, and J, S. Smith. Cloning and 
expression of cDNA and genomic clones encoding three delayed 
rectifier potassium channels in rat brain. Neuron 4: 929-939. 
1990. 

29. Takimoto. K.. and E. S. Levitan. Glucocorticoid induction of 
Kvl.5 K"^ channel gene expression in ventricle of rat heart. Circ. 
Res. 75: 1006-1013. 1994. 

30. Takimoto. K.. D, Li. K. M. Hershman. P. Li, E. K. Jackson, 
and E. S. Levitan. Decreased expression of Kv4.2 and novel 
Kv4.3 K+ channel subunit mRNAs in ventricles of renovascular 
hypertensive rats. Circ. Res. 81: 533-539, 1997. 

31. Tomita. F.. A. L. Bassett, R. J. Myerburg. and S. Kimura. 
Diminished transient outward currents in rat hypertrophied 
ventricular myocytes. Circulation 7 5: 296-303, 1994. 

32. Tseng. C. J., G. N. Tseng, A. Schwartz, and M. A. Tanouye. 
Molecular cloning and functional expression of a potassium 
channel cDNA isolated from a rat cardiac library. FEBSLett. 268: 
63-68. 1990. 



Entrez PubMed 



EXHIBIT I 



Page 1 of 2 




Entrez 



PubMed 



Search IfubMed 



About Entrez 
Text Version 



Pub 




National | 
Library j 
of Medicine 





My NCBI 




rSiqn In] [Reaister] 



Nucleotide Protein 

m fori 



Genome 



Structure 



OMIM 



PMC 



Journals 



Books 



Limits Preview/Index History Clipboard Details 



|pisp]ay^| Abstra^ 



All: 1 Review: 0 Wj 



B Show:| 20 iF irSorT 



r Send to I Text 



Entrez PubMed 
Overview 
Help I FAQ 
Tutorial 

New/Noteworthy 
E-Utilities 

PubMed Services 
Journals Database 
MeSH Database 
Single Citation Matcher 
Batch Citation Matcher 
Clinical Queries 
LlnkOut 

My NCBI (Cubby) 

Related Resources 
Order Documents 
NLM Catalog 
NLM Gateway 
TOXNET 
Consumer Health 
Clinical Alerts 
ClinicalTrials.gov 

PubMed Central 



□ 1: J Cardiovasc ElectrophysioL 2000 Nov;ll(ll):1252-6L Related Articles. Links 

Early down-regulation of K+ channel genes and currents in the 
postinfarction heart. 

Huang B, Qin D, El-Sherif N. 

Department of Medicine, State University of New York Health Science Center, 
Brooklyn 11203, USA. 

INTRODUCTION: Down-regulation of key K+ channel subunit gene 
expression and K+ currents is a universal response to cardiac hypertrophy, 
whatever the cause, including the postmyocardial infarction (post-MI) 
remodeled heart, METHODS AND RESULTS: We investigated the hypothesis 
that down-regulation of K+ channel genes and currents post-MI occurs early 
and before significant remodeled hypertrophy of the noninfarcted myocardium 
could be detected. We investigated (1) the incidence of induced ventricular 
tachyarrhythmias (VT) in 3-day post-MI rat heart; (2) action potential (AP) 
characteristics of isolated left ventricular (LV) myocytes from sham-operated 
and 3-day post-MI heart; (3) time course of changes in outward K+ currents Ito- 
fast(f) and I(K) in isolated myocytes from 3-day and 4-week post-MI 
noninfarcted LV and compared the changes with sham-operated animals; and 
(4) changes in the messenger and protein levels of Kv2.1, Kv4.2, and Kv4.3 in 
the LV and right ventricle of 3-day post-MI heart. Sustained VT was induced in 
6 of 10 3-day post-MI rats and in none of 8 sham rats. The membrane 
capacitance of myocytes isolated from 3-day post-MI noninfarcted LV was not 
significantly different from control, whereas membrane capacitance 4- week 
post-MI was significantly higher, reflecting the development of hypertrophy. 
AP duration was increased and the density of Ito-f and I(K) were significantly 
decreased in 3-day post-MI LV myocytes compared with sham. The reduced 
density of Ito did not significantly differ in 4-week post-MI LV myocytes, 
whereas the density of I(K) was decreased further at 4 weeks post-MI. The 
changes in Ito-f and I(K) correlated with decreased messenger and protein levels 
of Kv4.2/Kv4.3 and Kv2.1, respectively. CONCLUSION: These results support 
the hypothesis that down-regulation of K+ channel gene expression and current 
in the post-MI LV occurs early and may be dissociated from the slower time 
course of post-MI remodeled hypertrophy. These changes may contribute to 
early arrhythmogenesis of the post-MI heart. 
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Members of the Kv1 and Kv2 Voltage-Dependent K"^ 
Channel Families Regulate Insulin Secretion 

PATRICK E. MACDONALD, XIAO FANG HA. JING WANG, SIMON R. SMUKLER, 

ANTHONY M. SUN, HERBERT Y, GAISANO, ANN MARIE F. SALAPATEK. PETER H. BACKX. and 

MICHAEL B. WHEELER * 

Departments of Medicine (H.Y.G., P.hi.B., M.B.W.) and Physioiogy (P.EM., X.F.H., J.W., S.R.S., 
AM.S., AM.F.S., P.H.B., M.B.W.), University of Toronto, Toronto Ontario, Canada M5S 1A8 



In pancreatic /3-cells, voltage-dependent K"** (Kv) 
channels are potential mediators of repolarization, 
closure of Ca^"^ ciiannels, and limitation of insulin 
secretion. The specific Kv channels expressed in 
/3-cells and their contribution to the delayed recti- 
fier current and regulation of insulin secretion in 
these cells are unclear. High-level protein expres- 
sion and mRNA transcripts for Kv1.4, 1.6, and 2.1 
were detected in rat islets and insulinoma cells. 
Inhibition of these channels with tetraethylammo- 
nium decreased Idr by approximately 85% and 
enhanced glucose-stimulated insulin secretion by 
2- to 4-fold. Adenovirus-mediated expression of a 
C-terminal truncated Kv2.1 subunit, specifically 



eliminating Kv2 family currents, reduced delayed 
rectifier currents in these cells by 60-70% and en- 
hanced glucose-stimulated Insulin secretion from 
rat islets by 60%. Expression of a C-terminal trun- 
cated Kv1.4 subunit, abolishing Kv1 channel family 
currents, reduced delayed rectifier currents by ap- 
proximately 25% and enhanced glucose-stimu- 
lated insulin secretion from rat islets by 40%. This 
study establishes that Kv2 and 1 channel homologs 
mediate the majority of repolarizing delayed recti- 
fier current in rat /3-cells and that antagonism of 
Kv2.1 may prove to be a novel glucose-depen- 
dent therapeutic treatment for type 2 diabetes. 
(Molecular Endocrinology 15: 1423-1435, 2001) 



THE ABILITY OF pancreatic islets of Langerhans to 
secrete Insulin In response to increased blood glu- 
cose levels is essential for the maintenance of normo- 
glycemia. Dysregulatlon of islet Insulin secretion is at 
least partly responsible for the development of type 2 
diabetes mellitus (1). In the 0-cell, glucose stimulation 
is coupled to insulin secretion through voltage-depen- 
dent and voltage-Independent mechanisms (2, 3). 
Voltage-dependent mechanisms of stimulus-secretion 
coupling are better characterized and are described in 
a number of reviews (4-7). Briefly, increased glucose 
metabolism in pancreatic /3-ceIls, resulting from high 
postprandial blood glucose, increases the Intracellular 
ATP:ADP ratio. This leads to closure of ATP-sensitive 
K"*" (K^Tp) channels and depolarization of the cell 
membrane (8), an effect mimicked by the sulfonylurea 
drugs Independent of blood glucose (9, 10). 

Depolarization of the /3-cell membrane results In the 
opening of L-type Ca^"^ channels, increasing the in- 
tracellular Ca^"^ concentration ([Ca^"^]!) and ultimately 
stimulating insulin secretion. /3-Cell repolarization is 
mediated by a delayed rectifier current (Idr) similar to 
those generated by voltage-dependent K"*" (Kv) or 

Abbreviations: [Ca^'^]„ intracellular Ca^'*' concentration; 
EGFP, enhanced green fluorescent protein; FT, freeze-thaw 
media; GSIS, glucose-stimulated insulin secretion; IBMX, 
3-isobutyl-1-methyIxanthine; HG-RPMI, high-glucose Ros- 
well Park Memorial Institute medium; I^r , delayed rectifier 
current: K^a. Ca^'^^-sensitive voltage-dependent K"^ channel; 
KRB, Krebs Ringer bicarbonate; Kv, voltage-dependent 
channel; LG-RPMI, low-glucose RPMI; TEA, tetraethyl- 
ammonium. 



Ca^'^-sensitive voltage-dependent K"^ (KcJ channels 
(5, 11-14). Accordingly, overexpresslon of a Kv chan- 
nel in transgenic mice was associated with hypergly- 
cemia and hypoinsulinemia, and in an Insulinoma cell 
line this manipulation attenuated [Ca^"*"]| increases 
associated with glucose stimulation (15). In addition, 
inhibitors of Idr are known to enhance [0^^'^\ oscilla- 
tions (16) and insulin secretion (11, 13) in a glucose- 
dependent manner. 

There are at least 11 currently known Kv channel 
families containing 26 homologs (1 7-22), and of these, 
members of the Kvl , Kv2, and Kv3 channel families 
mediate currents similar to those observed in pancre- 
atic /3-cells (5, 23-25). The task of identifying the chan- 
nel homologs responsible for repolarization of pancre- 
atic jS-cells is difficult because heterotetrameric Kv 
channels and channels associated with regulatory 
/3-subunits often do not exhibit the electrical and phar- 
macological properties of the constituent pore-form- 
ing subunits (17, 26-29). 

Despite previous studies showing that insulin- 
secreting cells express mRNA transcripts for a number 
of Kv and K^a channels (5) and Kv2.1 protein (11), no 
functional data exist for a role for specific channels or 
channel families in /3-cell repolarization and the regu- 
lation of insulin secretion. We have now characterized 
the mRNA and protein expression of Kvl and Kv2 
channel family homologs In rat islets and Insulinoma 
cell lines. Pharmacological agents and dominant- 
negative C-terminal truncated Kvl (Kvl .4N) and Kv2 
(Kv2.1 N) channel subunit mutants were used to deter- 
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nnine the role of specific channels in mediating Iqr and 
regulating insulin secretion In the glucose-responsive 
HIT-T15 cell line and in rat islets. 



RESULTS 

Effect of Idr Inhibition on Insulin Secretion 

HIT-T15 cells or rat islets were incubated with the 
general Kv and Kca channel antagonist tetraethylam- 
monium (TEA) at concentrations known to inhibit de- 
layed rectifier currents while having minimal effects on 
Katp channels (12, 30, 31). In HIT-T15 cells, TEA (20 
mw) enhanced glucose-stimulated Insulin secretion 
(GSIS) (from 0.51 ± 0.10 to 1 .43 ± 0.14 ng/ml/2 h. n = 
15; P < 0.001), but most importantly, had no effect in 
the absence of glucose (Fig. 1 A). Similarly, GSIS from 
rat islets was enhanced by TEA (20 mM) (from 0.17 ± 
0.03, n = 24 to 0.81 ± 0.18 ng/isleVh, n = 13; P < 
0.01), which had no effect in the absence of stimula- 
tory concentrations of glucose (control, n = 23; 20 mw 
TEA, n = 13) (Fig. IB). TEA enhanced GSIS from rat 
islets In a dose-dependent manner (Fig. 10) with an 
EC50 of 8.24 mw (n = 9). The effects of TEA were not 
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Fi9- 1» Iqr Inhibition Enhances GSIS 

The general Kv channel antagonist TEA (20 mM; black bars) 
enhanced insulin secretion from HIT-T15 insulinoma cells (A) 
and isolated rat islets (B) over 2 h compared to controls {white 
bars). This effect occurred only in the presence of stimulatory 
glucose. In rat islets, TEA dose-dependently enhanced insu- 
lin secretion stimulated by 15 mM glucose in a dose-depen- 
dent manner (C). The half-maximal effect of TEA was ob- 
served at 8.24 mM. \P< 0.05. P < 0.01 ; and P < 0.001 
compared with controls. 



related to cellular toxicity, since a 2-h exposure (20 
mM) did not affect the survival of HIT-T15 cells, as 
detected by propidium iodide fluorescence (not 
show/n). 

To determine whether TEA'S insulinotropic activity 
was dependent upon depolarization through K^tp 
channel closure and not glucose perse, we examined 
whether TEA could enhance insulin secretion stimu- 
lated by Katp channel inhibition in the absence of 
glucose. Micromolar concentrations of the K^tp chan- 
nel antagonist glyburide (Sigma, St. Louis, MO) have 
been shown to stimulate insulin secretion from HIT- 
T1 5 cells in the absence of glucose (32, 33). Glyburide 
(2 fxM) simulated Insulin secretion neariy 2-fold from 
HIT-T15 cells (from 0.14 ± 0.01, n = 8 to 0.25 ± 0.01 
ng/ml/2 h. n = 8; P < 0.001 ) in the absence of glucose. 
Addition of TEA (20 mM) in the presence of glyburide 
enhanced insulin secretion further (to 0.56 ± 0.06 
ng/ml/2 h, n = 8; P < 0.01 compared with glyburide 
alone) (Fig. 2). 

Similarly, islets were incubated in nonstimulatory 
concentrations of glucose (2.5 mM) with TEA (20 mM) 
and/or the sulfonylurea drug glyburide. Glyburide at 2 
/AM elicited a large insulin response that was not en- 
hanced by 20 mM TEA (Fig. 3A). Because the micro- 
molar concentrations of glyburide commonly used to 
stimulate insulin secretion are approximately 4,000 
times the published EC50 in rodent islets (34), nonspe- 
cific effects on Ion channels or non-/3-cells are possi- 
ble. With 10 nM glyburide, TEA (20 mM) significantly 
enhanced insulin secretion compared with glyburide 
alone in both the presence (n = 10; P < 0.05) and 
absence (n = 1 2; P < 0.05) of stimulatory glucose (Fig. 
3B). PKA pathway signaling enhances GSIS, partly 
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Fig. 2. lop Inhibition Enhances Glyburide-Stlmulated Insulin 
Secretion from HIT-T15 Cells 

In the absence of glucose, the channel antagonist 
glyburide (2 /xm; hatched bar) depolarizes HIT-T15 cells and 
stimulates insulin secretion compared with control {white 
bar). The general Kv channel antagonist TEA (20 mM) alone 
(gray bar) had no effect on unstimulated Insulin secretion but 
further enhanced insulin secretion from HIT-T15 cells depo- 
larized by glyburide {black bw). **, P < 0.01 and P < 0.001 
compared with control. 
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Fig. 3. Idr Inhibition Enhances the Insulinotropic Effect of 

Katp 3nd PKA Pathway Agonists 
In the presence of 2.5 mM glucose (A and B) the Katp channel antagonist giyburide [hatched bars, at 2 fs,M (A) or 10 nM (B)] 
stimulates insulin secretion from isolated rat Islets compared with controls {white bars). The general Kv channel antagonist TEA 
(20 mM; gray bars) had no effect on unstimulated insulin secretion, but further enhanced insulin secretion from isolated rat islets 
together {black bars) with 10 nM giyburide (B). With stimulatory glucose (15 mM. panels C and D), TEA (20 mM) enhanced insulin 
secretion and the effects of secretagogues acting through the Katp (panel C. 10 nM giyburide) and PKA (panel D, 1 jxm IBMX) 
pathways. *, P < 0.05; **, P < 0.01; and ***, P < 0.001 compared with controls unless otherwise indicated. 



through actions on ion channels (35. 36). In the present 
study. TEA (20 mM) enhanced the Insulinotropic effect 
of the PKA pathway agonist 3-isobutyl-1 -methylxan- 
thine (IBMX) (1 ^lm) In the presence of stimulatory 
glucose (Fig. 3D). These results demonstrate that 
membrane depolarization is sufficient to allow TEA'S 
insulinotropic effect and that TEA enhances the insu- 
linotropic effects of agonists acting through the K^yp 
and PKA pathways. 



Pancreatic Islet and p-ceW Kv 
Channel Expression 

The above results demonstrate that the blockade of 
ippt can enhance insulin secretion when glucose or 



channel antagonists close K^^p channels. To deter- 
mine which K"*" channels mediate Iqr in Insulin-secret- 
ing cells, HIT-T15 cell and rat islet total RNA were 
examined for Kv gene transcripts via RT-PCR (Table 
1). RT-PCR of HIT-T15 cell RNA with Kv1.1, 1.3, and 
1 .4 specific primers resulted in amplification products 
of the expected size. RT-PCR of rat Islet RNA yielded 
cDNA fragments of the correct size for Kvl .2, 1 .3, 1 .4, 
1.6, and 2.1. Sequencing confirmed that each frag- 
ment corresponded to the appropriate channel with a 
high degree of nucleotide and predicted amino acid 
identity with the respective human channel. All primer 
sets produced PCR products of the appropriate size 
upon RT-PCR of rat brain or mouse skeletal muscle 
(Kvl .7) total RNA as a positive control. No PCR prod- 
uct was visible in the water blank controls. 
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^ Identity as compared to human cDNA and protein sequences. 
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Fig. 4. Kv and Kca Channel Protein Expression 

HIT-T15 cell, /3TC-6f7 cell, rat islet, and rat brain lysates (50 
/i,g protein) were probed for Kvl , Kv2, and Kca channel pro- 
teins using specific antibodies (see Materials and Methods). 
When availatsle, control antigen (blocking peptide) was incu- 
bated with the channel antibody before probing of mem- 
branes to demonstrate the specificity of detection. Kv2.1 
protein was detected with two separate antibodies (anti- 
Kv2.1a and anti-Kv2.1b). Anti-Kv2.1b was found to be more 
species specific for rat. 

Western blot studies confirmed the protein expres- 
sion of Kv1.4, i.6, 2.1, and 1.2 (at lower levels) In rat 
islets (Fig, 4). Expression of Kv1 .4 and 2.1 protein was 
detected in the HIT-T15 and /3TC-6f7 insulinoma cell 
lines. Despite failure to detect Kv2.1 mRNA in HIT-T15 
cells by RT-PCR, protein expression by this cell line is 
clearly abundant. It is possible that species selectivity 
of our primers resulted in our inability to detect the 
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mRNA transcript in this hamster cell line. Levels of 
Kv2.1 protein detected in islets were roughly equiva- 
lent to those in the rat brain control using the Kv2.1b 
antibody. However, the levels of Kv2.1 detected dif- 
fered markedly between the two antibodies, possibly 
reflecting variable species affinity (Fig. 4). Kv1.1 pro- 
tein was detectable at low levels in HIT-T15 cells with 
longer exposures (not shown) but was not detected in 
rat islets. Specific protein bands for the K^a channels 
BK, SK2, and SK3 were not detected in insulin-secret- 
ing cells, with the exception of a light but detectable 
band for SK2 In pTC-6f7 cells (Fig. 4). Detection of 
Kv2.1 was used as a positive control in all protein 
samples from Islets and jS-cell lines. 

Characterization of TEA-Sensitive Iqr in insulin- 
Secreting Cells 

As illustrated in Fig. 5, Iqr recorded from HIT-T15 cells 
or rat islet cells were noninactivating over 500 msec. 
Despite similar kinetic properties, current amplitudes 
(at the end of a 500-msec pulse to 70 mV from a 
holding potential of -70 mV) In HIT-T15 cells were 
approximately double those observed in rat islet cells 
(Fig. 5). Because of the inclusion of 1 mM EGTA and 5 
mM MgATP within the pipette solution, the outward 
currents are expected to primarily reflect the opening 
of Kv channels, with minimal contributions from K^a or 
Katp channels. Since native /3-cells operate over a 
range of membrane potentials, we studied outward 
currents In islet cells from a range of holding potentials 
and found no differences between currents elicited 
from -90, -70, or -50 mV. Steady-state inactivation 
protocols (over 15 sec) showed sustained currents 
displaying a half-maximal voltage sensitivity (V^/s) of 
-32.47 ± 1.53 mV(n = 12). 

Consistent with its ability to inhibit Kv and Kca chan- 
nels far more potently than K^^p channels (1 2, 31), TEA 
(20 mM) inhibited outward K"^ currents from HIT-T15 
and rat islet cells by 85.5 ± 2.7% (n = 9; P < 0.001) 
and 87.9 ± 1 .2% (n = 1 1 ; P < 0.001). respectively (Fig. 
5). The effect of TEA was reversible upon washing after 



Table 1, identification of Kv Channel Homologs in HIT-TIS Cells and Rat Islets by RT-PCR of Total RNA 

HIT-T15Cells Rat Islets 

Kv Expected Product 



Transcript Size (bp) % Sequence % Amino Acid % Sequence 

Identity* Identity* Identity" 
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Fig. 5. /3-Cell Iqr Is Blocked by TEA 

Outward K"^ currents were recorded by depolarizing with a series of 500-msec pulses from a holding potential of -70 mV in 
20-mV increments to a maximal depolarization to 70 mV. Data were nomialized to cell capacitance. Representative traces from 
a typical HIT-TI 5 cell (open marks) and rat islet cell {black marks) are shown under control conditions {triangles) and tn the 
presence of 20 mM TEA {circles) in panel A. In panel B, the current-voltage relationship of maximum sustained current was plotted 
for both HIT-T15 cells (open marks) and rat islet cells {black marks). At more physiological temperatures (31-33 dashed line), 
sustained outward currents were moderately larger and also were largely blocked by 20 mw TEA. P < 0.001 compared with 
controls. 



exposures of as long as 2 h, sinnilar to the exposures 
used for the above insulin secretion studies (data not 
shown). The biophysical and pharmacological proper- 
ties of these currents most closely resembled those 
mediated by members of the Kv1 , 2, and 3 families, 
but not the Kv4 family (37-39). Outward currents from 
rat islet cells at more physiological temperatures 
(31-33 C) were somewhat larger at the end of a 500- 
msec depolarization to 70 mVfrom -70 mV. However, 



current inhibition by 20 mw TEA (86.4 ± 1.2%, n = 9: 
P < 0.001) was not significantly different compared 
with room temperature. 

Effect of Kv and K^g Channel Antagonists on 
Insulin Secretion 

To investigate whether specific Kv or Kq^ channels 
contribute to the regulation of insulin secretion, exper- 
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iments were performed using selective channel antag- 
onists. Margatoxin (100 hm), which inhibits Kvl.3 and 
1 .6 with an IC50 of 30 pM and 5 nw, respectively (40), 
did not effect Insulin secretion from either HIT-T15 
cells or rat Islets (Table 2). Dendrotoxin (200 nM), an 
inhibitor of both Kv1 .1 and 1 .2 channels with an IC50 of 
20 nM (41, 42), did enhance GSIS from HIT-T15 cells 
(Table 2) accompanied by a 26.3 ± 9.7% (n = 7; P < 
0.001) reduction in I^r, but did not enhance insulin 
secretion from rat islets. This is consistent with our 
ability to detect mRNA transcripts for Kv1.1 and vari- 
able but low Kvl .1 protein in HIT-T1 5 cells, but not rat 
islets. Specific antagonists are not available against 
cloned Kv1 .4 channels, the other Kv1 family member 
that was detected. However, heterotetrameric chan- 
nels formed from this subunit are insensitive to TEA 
(41) and are therefore less likely contributors to TEA's 
insulinotropic effect. Because no specific antagonists 
to Kv2 family channels are commercially available, this 
characterization was limited to antagonists of Kvl 
channel family members. 

Because both large- and small-conductance 
currents have been detected in insulin-secreting cells 
(43-48), we Investigated the effect of Kca channel 
antagonists on GSIS from rat islets. Neither the small 
conductance K^a antagonist apamin (200 nM) nor the 
large- and medium-conductance Kca antagonist ibe- 
riotoxin (1 00 nM) had a significant effect on GSIS from 
rat Islets compared with controls (Table 2). However, 
this does not rule out the possibility that an apamin- 
insensitive small-conductance K^a current may have a 
role in regulating insulin secretion (49. 50). 

Effect of Dominant-Negative Knockout of Kv1 
and 2 Channels on p-CeW I^h 

To further investigate the role of the Kv1 and 2 family 
channels in mediating p-ceW IpR, we used a recombi- 
nant adenovirus approach to express dominant- 
negative Kvl (AdKv1.4N) and 2 (AdKv2.1N) channel 
subunits. Mutation or truncation involving all or part of 
the pore-forming loop results in nonfunctional sub- 
units that can coassemble with and eliminate ion flow 
through endogenous channels of the same family. 
Similar approaches have been used to study and Iden- 
tify subunit assembly of native Kv channels (24, 
51, 52). 



Expression of the Kv1.4N subunit in H1T-T15 cells 
and rat islet cells decreased Iqr by 26.8 ± 5.9% (n = 
14; P < 0.05) and 22.3 ± 5.3% (n = 8; P < 0.05), 
respectively, compared with controls (Fig. 6). Expres* 
sion of Kv2.1N reduced Ipf, in H1T-T15 cells and rat 
islets cells to a far greater extent (72.9 ± 2.9%; n = 24; 
P < 0.001 and 61.6 ± 3.2%; n = 22; P < 0.001, 
respectively) compared with enhanced green fluores- 
cent protein (EGFP)-expressing controls (Fig. 7). TEA 
(20 mM) further reduced outward K"^ currents in cells 
expressing Kv2.1N, eliminating a total of 94.3 ± 1.8% 
(n = 7; P < 0.001) (HIT-T1 5) and 86.9 ± 1 .8% (n = 1 1 ; 
P < 0.001) (rat islet cells) of I^h compared with EGFP 
controls (Fig. 7). Remaining currents in Kv2.1N-ex- 
pressing rat islet cells after the addition of 20 mM TEA 
resembled A currents mediated by cloned Kv1 .4 and 
could be inactivated by holding at -50 mV, a protocol 
known to inactivate A currents (53) (Fig. 8). These 
results suggest that the Kv1 and Kv2 channel families 
contribute approximately 20-30% and about 60-70% 
of the loR in insulin-secreting cells, respectively, po- 
tentially accounting for 80-100% of total Idr observed 
under the present conditions. Steady-state Inactiva- 
tion of K"*" currents recorded from rat islet cells was 
unchanged by the expression of the Kv1 .4N or Kv2. 1 N 
constructs, showing no differences In voltage sensi- 
tivity of the inactivating portion of the remaining cur- 
rents with values of -33.6 ± 1 .6 and -37.7 ± 1 .7 
mV (n - 4 and 9). 

Effect of Dominant-Negative Knockout of Kvl 
and Kv2 Family Ciiannels on Insulin Secretion 

Isolated islets were infected in vitro with AdKv1.4N, 
AdKv2.1N, or AdEGFP (control). Coexpresslon of 
EGFP allowed visualization of Infected cells and esti- 
mation of infection efficiency. Laser confocal micros- 
copy (not shown) and our previous studies (54) have 
shown that infection efficiencies of 30-50% are typical 
and cells within the islet core can be infected. Expres- 
sion of Kv1.4N in rat islets had no effect on basal 
Insulin secretion but significantly enhanced GSIS com- 
pared with control (0.031 ± 0.004 to 0.043 ± 0.007 
ng/lslet/h, n = 12; P < 0.05) (Fig. 9A). Likewise, ex- 
pression of Kv2.1N In rat islets did not effect basal 
insulin secretion and caused a much larger enhance- 
ment of GSIS compared with control (0.044 ± 0.009 to 



Table 2. Effect of Kvl and K^a Specific Antagonists on Glucose-Stimulated Insulin Secretion from HIT-T15 Cells and 
Rat Islets 


Antagonist 


Channels Blocked 


HIT-T15Cells(ng/mI/2h) 


Rat Islets (ng/islet/h) 


Control Drug 


Control 


Drug 


a-Dendrotoxin (200 nM) 
Margatoxin (100 nM) 
Apamin (200 nM) 
Iberiotoxin (100 nM) 


Kvl.1, 1,2 
Kvl .3, 1.6 
SKca channels 
BKca channels 


0.39 ± 0.03 0.51 ± 0.03" 
0.57 ± 0.07 0.49 ± 0.04 


0.15 ± 0.04 
0.09 ± 0.01 
0.25 ± 0.03 
0.11 ±0.02 


0.15 ± 0.02 
0.10 ± 0.01 
0.27 ± 0.05 
0.10 ± 0.01 


" P < 0.05 compared to control. 
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Fig. 6. Kv1.4N Expression Reduces /3-Cell Iqr 

Current-voltage relationships were obtained from HIT-T1 5 
cells (A) and rat islet cells (B) expressing control EGFP {tri- 
angles) or the dominant-negative Kvl .4N construct {circles). 
Inset, Western blotting for the Kv1 .4N construct showed ex- 
pression of the truncated protein in Kv1 .4N-GW1 H-trans- 
fected (2) and AdKv1 .4N-infected (3) HIT-T15 cells; only the 
full-length protein was detected in Kv1 ,4-GW1 H-transfected 
(4) or AdKv1 .4-infected (5) cells. Upon longer exposure, en- 
dogenous Kvl. 4 would be detectable in control lysates (1). *, 
P < 0.05 compared with controls. 



0.070 ± 0.018 ng/islet/h. n = 9; P < 0.001) (Fig. 9B). 
These results appear to be in good agreement with our 
electrophysiological observations, providing further 
evidence for a link between enhanced insulin secretion 
and reduction of Ior. 



DISCUSSION 

Repolarization of pancreatic p-cells after a glucose- 
induced depolarization is mediated by a voltage- 
dependent outward K"*" current, which assists in clo- 
sure of voltage-dependent Ca^"^ channels, thereby 
modulating insulin secretion (5, 11-14). Accordingly, 
the genera! inhibitor TEA enhances glucose-stim- 
ulated [Ca]j oscillations and insulin secretion (11-13, 



A. 



400 n R 




-50 -30 -10 10 30 50 70 
mV 

Fig. 7. Kv2.1N Expression Reduces ^-Cell Ipn 

Current-voltage relationships were obtained from H1T-T15 
cells (A) and rat islet cells (B) expressing control EGFP {tri- 
angles) or the dominant-negative Kv2.1N construct {circles). 
Outward currents in cells expressing Kv2.1N could still be 
reduced by addition of 20 mM TEA {open squares). Inset, 
Northern blotting for the Kv2.1N transcript showed expres- 
sion in AdKv2.1N-infecled (2) HIT-T15 cells (n = 2); no tran- 
script was detected in control-infected (1) cells (n = 2). ***, 
P < 0.001 compared with controls; and ###, P < 0.001 
compared with Kv2.1 N-expressing cells. 



16, 31). Consistent with an important role for these 
currents In /3-ceIls, we found that 20 mM TEA reduced 
'dr (by 85-90% at both room temperature and near- 
physiological temperature) and enhanced glucose- 
stimulated insulin secretion (—2- to 4-fold) in both 
HIT-T15 cells and isolated rat islets. As expected, 
since p-cell Iqr currents are postulated to activate only 
after glucose induced depolarization, TEA had no in- 
sulinotropic effect in the absence of stimulatory glu- 
cose. The ability of TEA to block Iqr and enhance 
glucose-dependent insulin secretion suggests that re- 
polarizing K"^ channels underlie Idr- However, the ef- 
fects of TEA do not resolve which K"^ channels are 
responsible for 1^^ in /3-cells. 

For a number of reasons, it is unlikely that TEA 
exerts its glucose-dependent insulinotropic effect by 
inhibiting K^tp channels. Unlike K^tp antagonists such 
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Fig, 8. Outward K"*" Currents in Kv2.1 N-Expressing Cells Ex- 
posed to TEA 

Remaining outward currents in AdKv2.1 N-infected rat islet 
cells exposed to TEA (20 mw) were small and displayed an A 
current component when depolarized to 30 mV from a hold- 
ing potential of —90 mV {triangle). Holding the cells at a more 
positive potential (-50 mV; square) before depolarization did 
not affect sustained currents (A), but dramatically reduced 
the Kv1.4-like A current component (B). Each trace Is an 
average of recordings from eight AdKv2.1 N-infected rat islet 
cells; the time represented by the black bar in panel A is 
shown oh an expanded scale in panel B. The very fast com- 
ponent (within 5 msec of depolarization) results from uncom- 
pensated capacitance transient, and the small differences 
in initial holding current result from the different holding 
potentials. 



as glyburide, TEA (20 mis/i) did not enhance unstimu- 
lated Insulin secretion (Figs. 2 and 3) (9, 10). In fact, the 
combination of TEA and glyburide enhanced insulin 
secretion to a greater degree than either alone, sug- 
gesting separate targets. Moreover, the glucose- 
dependent insulinotropic effect of TEA was observable 
at concentrations far lower than the published EC50 for 
Katp channels (Fig. 10). Finally, in the presence of high 
glucose, the majority of K^tp channels are closed, 
owing to an increase in the ATPiADP ratio (11, 55). 

Glyburide enhances insulin secretion from rodent 
islets with an EG50 of 0.5 nM (56), while human islets 
bind glyburide with a dissociation constant (KJ of 1 nM 
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2.5 mM Glucose 1 5 mM Glucose 

B. 




2.5 mM Glucose 15 mM Glucose 
Fig. 9. Kv1.4N and Kv2.1 N Expression Enhances GSIS from 

Rat Islets 

Insulin secretion from AdKv1.4N (panel A, black bars) and 
AdKv2.1 N (B, black bars)-infected rat islets was enhanced 
compared with controls {white bars). These dominant-nega- 
tive subunits enhanced insulin secretion only in the presence 
of stimulatory glucose, while no effect was observed under 
nonstimulatory conditions. *, P < 0.05; and **, P < 0.01 
compared with controls. 



(34). Here, a glyburide concentration of 10 nM stimu- 
lated a 2-fo!d Increase in insulin secretion from iso- 
lated rat islets in the absence of stimulatory glucose. 
TEA enhanced glyburide-stimulated insulin release, in- 
dicating that membrane depolarization is sufficient to 
allow TEA'S Insulinotropic effect. The inability of TEA 
to significantly enhance rat islet insulin secretion stim- 
ulated by 2 /AM glyburide (Fig. 3A) may result from 
nonspecific effects of this high dose of glyburide on 
other cell types within the islet, a problem that would 
not be present in a homogenous insulinoma cell line. 
Interestingly, in the presence of stimulatory glucose, 
the effects of glyburide or the phosphodiesterase in- 
hibitor IBMX were enhanced by TEA (Fig. 3, C and D). 
suggesting that TEA-like drugs may be used in com- 
bination with Katp or PKA pathway agonists for a 
greater insulinotropic effect. 

It is conceivable that Ca^"^-sensitive K"^ currents 
mediate the effects of TEA in our studies. Indeed 
Kca currents have been detected in insulin-secreting 
cells; however, reports regarding the pharmacolog- 
ical identification of these currents and their contri- 
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bution to glucose-induced electrical activity are 
conflicting (12. 30, 44-46, 48-50, 57-59). There is 
little functional evidence supporting a major role for 
channels in regulating insulin secretion, and we 
were unable to detect Kca protein or an insulino- 
tropic effect of general Kca channel antagonists (1 00 
nM Iberiotoxin and 200 hm apamin) in rat islets (Table 
2). It Is possible, nevertheless, that an apamin- 
insensitive small-conductance K^a current, possibly 
mediated by SK1 (60), can modulate insulin secre- 
tion (45, 49, 50). 

Although it seems clear that Kv channels are medi- 
ators of p-cell membrane repolarization, a role for 
specific channels in mediating I^r has not been es- 
tablished. Since Kv channels consist of homo- or het- 
erotetrameric proteins from the same family (17, 23, 
25, 29), we chose to express truncated subunits lack- 
ing the pore-forming region to selectively knock out 
functional channels In a family-specific manner. Sim- 
ilar approaches have been used to study and identify 
a-subunit assembly of native Kv channels (24, 51 , 52). 
In our study, the dominant-negative Kv1.4N and 
Kv2.1N constructs inhibited outward K"** currents 
when coexpressed with wild-type channels of the 
same family in HIT-T15 cells, but did not inhibit cur- 
rents resulting from different channel families (mem- 
bers of the Kvl , 2, 3, and 4 channel families were 
tested; data not shown). 

Expression of Kv2.1N in HIT-T15 cells or rat islet 
cells had a dramatic effect on reducing it by 
approximately 70 and 60%, respectively. This cor- 
related with an approximately 60% increase in GSIS 
from Kv2.1N infected islets compared with EGFP- 
expressing controls. Supported by the fact that the 
EC50 for the insulinotropic effect of TEA is within the 
range reported for Kv2.Vs ICgo for block by TEA 
(61-63), our data suggest an important role for the 
Kv2 family in insulin secretion. Kv2.1 protein was 
detected at levels comparable to the rat brain con- 
trol in both the insulinoma cell lines and rat islets. 
This is consistent with previous studies showing 
high-level protein expression of Kv2.1 in /3TC3-neo 
insulinoma cells and Kv2.1 mRNA in insulin-secret- 
ing cells (5, 1 1). Transcripts for Kv2.2, the only other 
Kv2 family member that forms functional channel 
pores, were not detected. Kv2.1N expression did 
not enhance insulin secretion to the same degree as 
seen with TEA and may be explained in a number of 
ways. The insulinotropic effect of TEA was mea- 
sured in response to an acute application of the 
drug, whereas the effect of Kv2.1N expression was 
measured after a more chronic expression protocol 
(2 days) that may have led to changes in the ma- 
chinery controlling insulin secretion. In addition, our 
adenoviral expression of the Kv2.1N construct was 
limited to approximately 50% of the cells. Infection 
of rat islets with control EGFP virus decreased basal 
insulin secretion and reduced insulin secretion in- 
duced by glucose. Although the degree of insulin 
secretion enhancement by Kv2.1N expression was 



compared with EGFP controls, it is conceivable that 
Kv2.1 N might contribute additional effects on Insulin 
secretion independent of Ior reduction. To minimize 
the possible effects of differential expression effi- 
ciency between control and experimental groups, 
Islets were infected with equal numbers of viral par- 
ticles and inspected for qualitatively similar levels of 
EGFP expression. Finally, it is still uncertain whether 
the relationship between 1^^ reduction and en- 
hancement of GSIS is linear, meaning that a reduc- 
tion in Ion greater than 60-70% may be required for 
a 2- to 4*fold increase in insulin secretion to occur. 

Expression of Kv1.4N in HIT-T15 or rat islet cells 
reduced Iqr by approximately 30 and 20%, respec- 
tively, and increased GSIS from rat Islets by about 
40% compared with EGFP controls. Of the Kv1 chan- 
nel family, Kv1.6 protein was detected at high levels In 
rat islets, while Kvl .4 protein was detected at high 
levels in rat islets and the insulinoma cell lines HIT-T15 
and /3TC-6f7. Kv1 .2 protein was detected at low levels 
in rat islets, and Kv1 .1 protein was detected variably at 
low levels in HIT-T15 cells. We did not examine the 
protein expression of Kvl .5 or 1.7, as neither was 
detectable in Insulin-secreting cells by RT-PCR, and 
both are known to be insensitive to TEA. Variable 
detection of Kvl.1 in HIT-T15 cells is consistent with 
the ability of Dendrotoxin to reduce Iqr and enhance 
insulin secretion in these cells. Our results suggest a 
minimal contribution of homotetrameric Kvl .6 or 
Kv1.4 channels to the insulinotropic effect of TEA 
since the former is sensitive to Margatoxin and the 
latter is insensitive to TEA. However, heterotetrameric 
channels containing these subunits cannot be ruled 
out since heterotetrameric channels do not necessar- 
ily possess the pharmacological sensitivities of their 
constituent subunits (29). Also, the presence of regu- 
latory j3-subunits, channel phosphorylation, and the 
channels oxidative state are known to significantly 
alter channel pharmacology and kinetics (27, 28, 64- 
67). We did observe a small A current component in 
Kv2.1 N-expressing rat islet cells In the presence of 20 
mM TEA that was inactivated by holding the cell at -50 
mV. This provides confirmatory evidence for the pres- 
ence of Kv1 .4-containtng channels but suggests a lim- 
ited role for them under normal conditions. 

Current type 2 diabetes treatments aimed at en- 
hancing insulin secretion are limited to the sulfonyl- 
urea drugs, which act in a glucose-Independent man- 
ner. This is because their mechanism involves 
inhibition of through an interaction with the as- 
sociated SUR1, depolarizing the cell, and triggering 
influx of Ca^"^ and ultimately insulin secretion. Be- 
cause TEA acts in a glucose-dependent fashion, en- 
hancing /3-cell depolarization rather than Initiating it, 
drugs acting at TEA's specific target may be consid- 
ered useful therapies that could also be expected to 
enhance the insulinotropic effect of K^Vp or PKA path- 
way agonists. In this study we identified high-level 
expression of Kvl .4, 1.6, and 2.1 In rat islets and have 
used an adenoviral approach to functionally knock 
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out these channels in isolated Islets. Dominant-neg- 
ative knockout of Kv2.1 enhanced insulin secretion 
by 60% in a glucose-dependent nnanner, while 
knockout of the Kvl channel family members had a 
similar, but lesser, effect. It seems clear, however, 
that Kv2.1, and potentially members of the Kvl 
channel family, may represent novel targets for the 
treatment of type 2 diabetes. 

MATERIALS AND METHODS 



Cell Culture and Islet Isolation 

HIT-T15 cells, a gift from R. P. Robertson (Pacific NW Re- 
search Institute, Seattle, WA), passage 80-95, were cultured 
in Roswell Park Memorial Institute (RPMI) 1640 medium sup- 
plemented with 10% FBS, 1% L-glutamine, and 1% penicil- 
lin-streptomycin. Islets of Langerhans were isolated from 
male Wistar rats, 250-350 g, by perfusion of the pancreas 
through the common bile duct with 10 ml of a collagenase 
solution (10 mg/100 g body wt) and incubation of the excised 
pancreas with shaking at 37 C. The digestion was washed, 
filtered through 355 ^m mesh, and separated on a density 
gradient created by resuspending the pellet in histopaque- 
1077 (Sigma, St. Louis, MO) and layering on serum-free me- 
dia [low-glucose (LG)-RPMI 1640 described below without 
serum). Islets were collected from the interphase and further 
purified from contaminating single cell types by sedimenta- 
tion. Isolated islets were cultured in LG-RPMI 1640 (7.5% 
FBS, 1 % penicillin/streptomycin, 0.25% HEPES, and 2.5 mw 
glucose) at 37 C and 5% CO2. 

Insulin Secretion Studies 

Twenty islets per well were plated in 24-well plates with 
LG-RPMI 1640 for insulin secretion studies. Twenty-four to 
48 h after isolation, islets were washed and LG-RPMI 1640 
was replaced by 2 ml of experimental media. Experimental 
media consisted of either LG-RPMI 1640 or high glucose 
(HG)-RPMI 1640 (15 mwi glucose) with or without various 
experimental agents (see figures). 

For H1T-T1 5 cell studies, cells were plated in 1 2-well plates 
at 5 X 10^ cells per well. Forty-eight hours after plating, 
HIT-T15 cells were washed with, and preincubated for 2x 30 
min in, Krebs Ringer bicarbonate (KRB) buffer (115 mM NaCI, 
5 mw KCI, 24 mM NaHCOa, 2.5 mM CaClg, 1 mM MgClg, 10 
mM HEPES, and 0.1% BSA). After preincubation, cells were 
washed with KRB buffer and then incubated in 1 ml of KRB 
buffer alone or with 1 0 mM glucose with and without exper- 
imental agents (see figures). 

All secretion studies were performed for 2 h at 37 C and 
5% CO2, after which media samples were taken and centri- 
fuged at 700 x g. RIAs were performed using a Rat Insulin 
RIA Kit (Unco Research. Inc., St, Charies, MO). Each exper- 
iment was performed with an n value of at least 8 in at least 
three separate experiments, and data were normalized to an 
unstimulated control to account for variation between prep- 
arations and are expressed as nanograms/islet/h or nano- 
grams/ml/2 h. Data were analyzed with Student's t test or 
Wilcoxon matched pairs test as appropriate. Dose-response 
curves and EC50 values for insulin secretion studies were 
generated using PRISM software (GraphPad Software, Inc., 
San Diego, OA). 

Dominant-Negative Kv Channel Constructs and 
Adenoviral Vectors 

El -deleted recombinant adenovirus shuttle vectors express- 
ing a C-terminal truncated Kvl .4 subunit (AdKv1.4N) or en- 



hanced green fluorescent protein (AdEGFP-RSV) alone under 
the control of the rous sarcoma virus promoter was provided 
by Dr. Roger J. Najjar (Cardiovascular Research Center and 
Heart Failure Transplantation Center, Massachusetts General 
Hospital, Harvard Medical School, Boston, MA). Recombi- 
nant adenoviruses expressing a C-terminal truncated Kv2.1 
subunit (AdKv2.1 N) or EGFP alone (AdEGFP-CMV) under the 
control of the cytomegalovirus promoter were prepared by 
CRE-lox recombination (68). All of these adenovirus con- 
structs coexpress EGFP with the gene of interest to facilitate 
the identification of infected cells. Adenoviruses were ampli- 
fied by passage in HEK 293 cells or CRE-8 cells (for viruses 
constructed by CRE-lox recombination). Infected cells were 
resuspended and lysed in 10 mM Tris, 1 mM MgClg, pH 8.0 [1 
mM freeze-thaw media (FT)] and purified by centrifuging the 
lysate on a gradient created by layering 3 ml each of 1.20 
g/ml, 1 .33 g/ml, and 1 .45 g/ml CsCI in 1 mM FT at 27,000 rpm 
for 2 h in a SW41-T1 rotor (Beckman Coulter, Inc., Fullerton, 
CA). Resultant bands were removed and dialyzed overnight 
against 1 mM FT and 10% glycerol and stored at -70 G until 
use. 

Infection of isolated rat islets was performed in 24-well 
plates with either 20 (insulin secretion studies) or 50 (electro- 
physiological studies) islets per welt on the day of isolation. 
Infection of HIT-T15 cells for electrophysiological studies 
(AdKv2.1N only) was performed in 35-mm dishes seeded 
24 h previously with 5x10^ cells per dish. Islets or HIT-T15 
cells were cultured in 0.5 ml of normal media with 1 x 10^° 
virus particles/ml for 2 h at 37 C and 5% CO2 after which 1 .5 
ml of LG-RPMI 1640 were added. Forty-eight hours later, 
islets or HIT-T1 5 cells were examined under UV light to detect 
the expression of EGFP. Insulin secretion studies, electro- 
physiological studies, RNA isolation, or protein isolation was 
carried out 48 h post infection. 

For HIT-T15 cell electrophysiological studies, a wild-type 
Kv1 .4 or a Kvl .4N constmct (in the GW1 H plasmid; provided 
by Dr. Najjar) was expressed by transfection with Lipo- 
fectamine (Life Technologies, Inc., Gaithersburg, MD) as per 
instructions of the manufacturer. This plasmid was cotrans- 
fected with the pEGFP plasmid (CLONTECH Laboratories, 
Inc. Palo Alto. CA) that expresses EGFP as a marker for 
transfection. Control cells were transfected with pEGFP 
alone. 

Electrophysiological Studies 

Islets were washed in and incubated with PBS and 0.2 mM 
EDTA with 1 .5% trypsin for 1 1 min, followed by mechanical 
dispersion and plating of single-islet cells overnight in LG- 
RPMI 1640 in 35-mm culture dishes. Cells were voltage 
clamped in the whole-cell configuration using an EPC-9 am- 
plifier and Pulse software (Heka Electronik, Lambrecht, Ger- 
many). Electrical identification of p-cells using a current 
clamp was not possible due to the intracellular solution re- 
quired to measure Iqr currents; however, the majority of islet 
cells (--70% or more) are /3-cells. and all electrophysiological 
experiments were confirmed In a clonal 0-cell line (HIT-T15). 
HIT-T15 cells were trypsinized and replated in 35-mm dishes 
24 h before electrophysiological studies. Patch pipettes were 
prepared from 1.5-mm thin-walled borosilicate glass tubes 
using a two-stage micropipette puller (Narishige, Tokyo, Ja- 
pan). Pipettes were heat polished and typically had a tip 
resistance of 3-6 MH when filled with intracellular solution 
containing (in mM): KCI, 140; MgClg-e HgO, 1; EGTA, 1; 
HEPES, 10; MgATP 5 (pH 7.25) with KOH. The bath solution 
contained (in mM): NaCi, 140; CaClg. 2; KCI, 4; MgClg * 6 H2O, 
1; HEPES, 10 (pH 7.3) with NaOH. All electrophysiological 
measurements reported were made at room temperature 
(22-24 C) and normalized to cell capacitance unless stated 
otherwise. For experiments at 31-33 C, temperature was 
maintained with an Olympus America Inc. temperature con- 
trol unit (Melville. NY) and continuous perfusion with warmed 
solutions. Outward currents were elicited with a 500-msec 
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depolarization In steps of 20 mV to +70 mV from a holding 
potential of -70 mV. Outward currents were also compared 
from holding potentials of -90, -70, and -50 mV using 
500-msec depolarizing pulses to 30 mV. To minimize varia- 
tion, maximum sustained current was determined from a third 
degree polynomial function fit to the final 25 msec of the 
500-msec depolarizing pulse. 

The voltage dependence of steady state inactivation was 
investigated by holding the cells at potentials from -80 to 30 
mV for 15 sec followed by a 5-msec prepulse to -70 and a 
500-msec depolarization to 30 mV to elicit outward currents. 
Steady state inactivation curves were fit with a Boltzman 
function: l/l^^^ = 1/[1 + exp([V - V^/zV^)] where is the 
voltage at which half the channels are inactivated, and s is the 
slope of the curve. For pharmacological studies, the drug was 
applied by perfusion for at least 5 min before recording. 
Outward currents at the end of the 500-msec depolarizing 
pulse were compared using the t test. 

RNA Analysis 

Total RNA was obtained from rat islets (24-48 h after isola- 
tion), rat brain, and HIT-T15 cells using Trizol (Life Technol- 
ogies, Inc.) as per the manufacturer's instructions. RT-PCR 
was performed on 1 /i.g of total RNA using a GeneAmp RNA 
PGR kit (Perkin-Elmer Corp., Branchburg, NJ) according to 
the manufacturer's instructions. PGR primers used were de- 
signed to conserved sequences of rat Kv1.1 [Forward (F): 
5'-AAGGATCCGTCATTGTGTCC-3'; Reverse (R): 5'-AAAG- 
GCCTAAACATCGGTCAG-3'], Kv1.2 (F: 5'-GTAAAGCA- 
CACTTCTCAAGCCCC-3'; R: 5'-CCTCCCGAAACATCTCA- 
ATTGC-3'); Kv1.3 (F: S'-GAGATCCGCmTACCAGCTGGG- 
3'; R: 5'-CATGATATTTCTGGAGAAGG-3'); Kv1.4 (F: 5'- 
GATAGCCATTGTGTCCGTCCTGG-3'; R: 5'-GGCACACAG- 
GGACCCGACAATC-3'); Kv1.5 (F: 5'-CTGAGAGGGAGA- 
GAGGCAGGG-3'; R: 5'-GCAGCTCCTGAGGCATAGGG-3'); 
Kvl.6 (F: 5'-GTTGGTGATCAACATCTCCGGG-3'; R: 5'- 
GGCCGCCTTGCTGGGACAGG-3'); Kv1.7 (mouse) (F: 5'- 
TCTCCGTACTCGTCATCCGG-3'; R: 5'-AAATGGGTGTC- 
CACCCGGTC-3'); Kv2.1 (F: 5'-CGAGGAGCTGAAG- 
CGGGAGG-3'; R: 5'-GGAAGATGGTGACGTAGTAGGG-3'); 
and Kv2.2 (F: 5'-GGATGCCTTTGCTAGAAGTATGG-3'; R: 5'- 
CGCTGGCACTGTCAGGTTGC-3'). PGR was also performed 
on water blank controls containing no cDNA template and rat 
brain cDNA as a positive control. PGR was performed with 35 
cycles of 94 G for 30 sec, 60 G for 35 sec, and 72 G for 45 sec 
followed by a 1 0-min extension at 72 C. PGR products of the 
expected size were excised from an 1 .2% low melt agarose 
gel and ligated into the pCR2.1 vector and sequenced using 
the universal Ml 3 reverse primer. Resulting sequences were 
subjected to analysis by NCBI Blast (NGBI, Bethesda, MD) 
and nucleotide and amino acid identity analysis with MacD- 
NASIS (Hitachi Software, San Francisco, GA). 

Northern analysis was used to detect expression of mRNA 
transcripts for Kv2.1N in total RNA (7.5 /itg) from AdKv2.1N- 
or AdEGFP-infected HIT cells as described previously (69). 
Probes were generated by random priming (Random Primers 
DNA Labeling System, Life Technologies, Inc.) of Kv2.1N 
cDNA and incorporation of P^^-dCTP. Blots were washed 
twice by shaking in room temperature 0.1% SDS/2XSSG 
followed by a 30-min wash in 0.1 % SDS/0.1 x SSC at 55 C. 
Blots were exposed overnight to X-OMAT AR film (Eastman 
Kodak Co.. Rochester. NY). 

Protein Analysis 

Immunoblotting of Kv channel proteins was performed as 
previously described (70, 71). Briefly, the islets were washed 
in ice-cold PBS, solubilized in 2% SDS loading buffer, boiled 
for 1 0 min, and passed through a 23G needle. Fifty micro- 
grams of the protein from each sample, detenmined by Low- 
ry*s method, were loaded and separated on a 1 0% polyacryl- 



amide gel. The protein was transferred to PVDF-Plus (Fisher 
Scientific Ltd., Nepean, Ontario, Canada) membrane and im- 
munodecorated with primary antibody or antibody-antigen 
solutions (diluted according to the supplier's instructions) for 
1.5 h at room temperature. Primary antibodies were from 
Alomone Labs (Jerusalem, Israel) (Kv1.2. 1.3, 1.4, 1.6, 2.1) 
and Upstate Biotechnology. Inc. (Lake Placid, NY) (Kvl.1, 
2.1). Primary antibodies were detected with appropriate sec- 
ondary antibodies (sheep antimouse, 1:10,000; donkey anti- 
rabbit, 1 :7,500; Amersham Pharmacia Biotech Ltd., Bucking- 
hamshire, U.K.) for 1 h, and then visualized by chemilum- 
inescence (EGL-Plus, Amersham Pharmacia Biotech Ltd.) 
and exposure of the filters to Kodak film (Eastman Kodak Co., 
Rochester, NY) for 5 sec to 10 min. At least three blots were 
performed for each protein investigated. 
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Downregulation of K(+) channel genes expression in type I 
diabetic cardiomyopatliy. 

Qin D, Huang B, Deng L, El-Adawi H, Ganguly K, Sowers JR, El-Sherif N. 

Department of Veterans Affairs, New York Harbor Healthcare System, 
Brooklyn Campus, Brooklyn, New York, 11209, USA. 

Type I diabetic cardiomyopathy has consistently been shown to be associated 
with decrease of repolarising K(+) currents, but the mechanisms responsible for 
the decrease are not well defined. We investigated the streptozotocin (STZ) rat 
model of type I diabetes. We utilized RNase protection assay and Western blot 
analysis to investigate the message expression and protein density of key 
cardiac K(+) channel genes in the diabetic rat left ventricular (LV) myocytes. 
Our results show that message and protein density of Kv2.1, Kv4.2, and Kv4.3 
are significantly decreased as early as 14 days following induction of type I 
diabetes in the rat. The results demonstrate, for the first time, that insulin- 
deficient type I diabetes is associated with early downregulation of the 
expression of key cardiac K(+) channel genes that could account for the 
depression of cardiac K(+) currents, I(to-f) and I(to-s). These represent the main 
electrophysiological abnormality in diabetic cardiomyopathy and is known to 
enhance the arrhythmogenecity of the diabetic heart. The findings also extend 
the extensive list of gene expression regulation by insulin. Copyright 2001 
Acadeniic Press. 
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EXHIBIT L 



characterize the protein. A starting material that can only be used to produce 
a final product does not have a substantial asserted utility in those instances 
where the fmal product is iibt supported by a sjpecific aiid substantial utility. 
In this case none of the proteins that are to be produced as final products 
resulting from processes involving the claimed cDNA have asserted or 
identified specific and substantial utilities. The research contemplated by 
Applicants to characterize potential protein products, especially their 
biological activities, does not constitute a specific and substantial utility. 
Identifying and studying the properties of the protein itself or the 
mechanisms in which the protein is involved does not defme a "real world" 
context of use. Note, because the claimed invention is not supported by a 
specific and substantial asserted utility for the reasons set forth above, 
credibility has riot been assessed; Neither the specification as filed nor any 
art of record discloses or suggests any property or activity for the cDNA 
compounds such that another non-asserted utility would be well established 
for the compounds. 

Claim 1 is also rejected under 35 U.S.C. § 1 12, first paragraph. 
Specifically, since the claimed invention is not supported by either a specific 
and substantial asserted utility or a well established utility for the reasons set 
forth above, one skilled in the art would not know, how to use the claimed 
invention. 

Example 10: DNA Fragment encoding a Full Op en Reading Frame 
(ORF) 

Specification: The specification discloses that a cDNA library was prepared 
from human kidney epithelial cells and 5000 members of this library were 
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sequenced and open reading frames were identified. The specification 
discloses a Table that indicates that one member of the library having SEQ 
ID NO: 2 has a high level of homology to a DNA ligase. The specification 
teaches that this complete ORF (SEQ ID NO: 2) encodes SEQ ID NO: 3. 
An alignment of SEQ ID NO: 3 with known amino acid sequences of DNA 
ligases indicates that there is a high level of sequence conservation between 
the various known ligases. The overall level of sequence similarity between 
SEQ ID NO: 3 and the consensus sequence of the known DNA ligases that 
are presented in the specification reveals a similarity score of 95%. A search 
of the prior art confirms that SEQ ID NO: 2 has high homology to DNA 
Ligase encoding nucleic acids and that the next highest level of homology is 
to alpha-actin. However, the latter homology is only 50%. Based on the 
sequence homologies, the specification asserts that SEQ ID NO: 2 encodes a 
DNA ligase. 

Claim 1: An isolated and purified nucleic acid comprising SEQ ID NO: 2. 

Analysis: The following analysis includes the questions that need to be 
asked according to the guidelines and the answers to those questions based 
on the above facts: 

1) Based on the record, is.there a "well established utility" for the 
claimed invention? Based upon applicant's disclosure and the results of the 
PTO search, there is no reason to doubt the assertion that SEQ ID NO: 2 
encodes a DNA ligase. Further, DNA ligases have a well-established use in 
the molecular biology art based on this class of protein's ability to ligate 
DNA. Consequently the answer to the question is yes. 
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Note that if there is a well-estabHshed utility already associated with the 
claimed invention, the utility need not be asserted in the specification as 
filed. In order to. determine whether the claimed invention has a well- 
established utility the examiner must determine that the invention has a 
specific, substantial and credible utility that would have been readily • 
apparent to one of skill in the art. In this case SEQ ID NO: 2 was shown to 
encode a DNA ligase that the artisan would have recognized as having a 
specific, substantial and credible utility based on its enzymatic activity. 

Thus, the conclusion reached from this analysis is that a 35 U.S.C. §- 
101 rejection and a 35 U.S.C. § 1 12, first paragraph, utility rejection should 
not be made. 

Example 11: Animals with Uncharacterized Hu man Geiies 

Specification: Kidney cells from a patient with Polycystic Kidney (PCK) 
Disease have been used to make a cDNA library. From this library 8000 
nucleotide "fragments" have been sequenced but not yet used to express 
proteins in a transformed host cell nor have they been characterized in any 
other way. The 50 longest fragments, SEQ ID NO: 1-50, respectively, have 
been used to make transgenic mice. None of the 50 lines of mice have 
developed Polycystic Kidney Disease to date. The asserted utility is the use 
ofthe mice to research human genes from diseased human kidneys. The 

disease is inheritable, but chromosomal loci have not yet been identified. 
Neither the absence or presence of a specific protein has been identified with 
the disease condition. 
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27.271.853 high-quality sequence 'l^^J.^^^^^^f^^^^^^ 

from both ends of plasmld c ones J'^"^/'"^^^^^ regional chromosome 

assembly strategies-a ^^^-g^^JlV ^^n^^^ ^^^^^ """^ '''' 

assembly^were used, each "^"/^^^^^^ ,h,edded into 550-bp 

publicly funded genome effort, P"W.c data w ^^^^^ ^^^^ ^^^^^ 

Segments to create a ^^'^f -^//f fjT cloning and assembly 

sequenced, without mclud.ng ^'^^^ .'"%'^"Vhis brought the effective cov-. 
procedure uied by t^^.P^^'f^y /"'^Jtrc^fg the nu^^^^^ ?'.8^P^ ' 

•^rage in the. assemblies. *<> ^'g^^^f^JJ/'lSeJ V/Uh S.ll-fold coverage. The 
the final assembly over ^hat w°uld be o^^^^^^ ^^^^ ^g,,, 

two assembly strategies y.eWed very s mU^^^^^^^ 

independent mapping data. Th^^f^^j'J^^^fthan 90% of the genome is n 
regions of the human '^h^'"''^*'/"**; "l^^^^^ 25% of the genome is in 
^cf ffold assemblies ^^^^-''^t^ ^'J^ZVS^ genome sequence revealed 
scaffolds of 10 million bp or larger ^nfys'* f ^^^^^g corroborating 
26.588 proteln-encojng transcnpu fo^w^^^^^ 

evidence and an additional ~1 2,00° conipu^^ gene-dense clusters are 

matches or other v^eak supporting ^Y'^^f"^^^^^^ sequence separated 

obvious, almost half the genes are ^1 PJ^f''^^^^^^^^^ n.r/. of the genome 
by large tracts of apparent^ "^""^^Jf^^^^^^^^^^^ 75% of the genome being 
Is spanned by exons. whereas 24^ Is J"""" ' ^ j^g |n size up to chro- 

ntJrgenlc DNA. Duplications <>/3^,"jJ,^^e^'^^^^^^ and reveal a complex 
mosomal lengths, are abundant throughout ^.^^^^^ ^^^^^^^^^ 

evolutionary history. Comparat ye genorn.c ana^ tf^ue-specific de- 

pansions of genes associated ^''^^ "^"^J^^tash and immune systems. DNA 

sequence comparisons between uuon ingle-nucleotide polymorphisms 

genome data provided locations of 2.1 7^'"'°"/'"^^^^ at a rate of 1 bp per 

(sSps). a random pair of human haplo-d g«"°^^^^^^^^^^^^ ^^^^^ ,f ly.. 

5,250 on average, but there ^^^^^^^f ^^'^.^^oTa^ 

remains an open challenge. 



Decoding of the DNA that constitutes the 
h^l genome has been widely ant.c.pa^d 
foTthe contribution it will make toward un- 



derstanding human evolution, the causation 
of d£e: and the .^'^^^ 
environment and heredity in defmmg Ae hu- 
man condition. A project with 8°;^ ° 
?ctem»ining the complete ^ucWc se 
nuence of the human genome was first for 
Sy proposed in 1985 (i). to subsequent 
veaL me idea met N>-ith mixed reactions m 
Sc sc-lific community (2). H--^-'^- 
1990 the Human Genome Project (HOP) was 

IfnciaUy initiated in theUnited States under 
th?Sk of lie National I^stUates of 
Health and the U.S. Department of Energy 
Xl5-ycar.S3bi,ionj.^f^^^^^^^ 



jNA using chain-terminating nucleotide ana- 
logs (3). In the same year, the first • 

Llated and sequenced {4). to 1986. Hood 
and ccKWOrkers (5) described an ujnprovement 
in the Sanger sequencing method that mcluded 
attaching fluorescent dyes to the nucleotide 
which Smutted them to be ^^<i^;f^y'^^^ 
by a computer. The first autom^ed DNA se- 
quencer, developed by Applied Biosystems m . 
California in 1987, was shown to be successfid 
when the sequences ofhvo genes were obtemed ... 
^Sh this ne'v technology (6).. From early sc.. 
fluencing of human genomic regions C7). K 
Scame clear that cDNA sequences (which ^ 
reverse:transcribed from RNA) «; 
sential to annotate and validate gene predictions 
m the human genome. These studies were ^e 
basis in part for the development of the ex- 
pressed sequence Ug (EST) ^f^^l^^' 
identification (.8). which is a nmdom s kc^^^^ 
very high throughput ^equencmg approach Jo 
chZacterize cDNA Ubraries. The EST method 
led to the rapid discovery and mappmg of hu- 
man genes (P). The increasing numbers of bu- 
rn^ E?T sequences necessitated the deve op- 

Tnt of new computer -^^^^^^^f^Z 
large amomts of sequence data, and m 1993 

The Institute for Genomic Research (TIGR). an 
ISoStai was developed that pcmutted assem- 
Wy STanalysis of h^^dr^ds of thousan^f 
ESTs. This algorithm pemiitted character^^ 
tion and annotation of human genes on the basis 
ofSO.OOO EST assemblies (iO). 

T^ecompUte49-kbpbacteriophasclamb- 

da genome sequence was detenmned by a 
shofgun restriction digest method in 982 
;?)'^en considering metbodsforseque^^^^^^ 

ine the smallpox viriis genome m 1991 {U). 
a fvhole-genome shotgun sequencmg method 
was discussed and subsequently r£C«d ow- 
ng ,0 the lack of appropriate softw^e tools 
for genome assembly. However, m 1994 
.whenamicrobialgenome-sequencmg project 

was contemplated at TIGR, a whole-genome 
shot^ sequencing approach was consid red 
pOssK Jith the TIGR EST assembly algo- 

Sthm. to 1995, the l-^'^^P^S't; a 
mniien-ae genome was completed oy a . 
wbde-genome shotgun sequencmg mc&od 
T/i). T^e experience wi^ seve^ -^^1 
genome-sequericms e^^^^^^^^ ii). 
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♦ion of the Human gciiuu.v. * - . , of double-strandea ui^^ ^ 



the genome 5C4u^t*v^- . „«««m» 

our lntentio.n>t6 build a umque genome- 
sequencing facility, to detemune the 
ouence ofVe human genome over a 3-year. . 

■^A Here we report the pfenultimate mile- 
SfalSgthTpra^towardUgoa^^ 
cSte sequence of the euchromat.c por- 
compicic o«i seouencing 
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neously map and sequence the human ge- 
nome by means of end sequenc^s*from 150- 
kbp bacterial artificial chromosoines (BACs) 
(77, 18). The end sequences spanned by 
known distances provide long-range continu- 
ity across the genome. A modification of the 
BAG end-sequencing (EES) method was ap- 
plied successfully to complete chromosome 2 
from the Arabidopsis thaliana genome (7P). 

In 1997, Weber and Myers (i^) proposed 
whole-genome sholgxm sequencing of the 
; human genome. Their proposal was not well 
. received (27). However, by early 1998, as 
less ■ than 5% of the genome had been se- 
quenced, it was clear that the rate of progress 
in human genome sequencing worldwide 
was very slow (22), and the prospects for 
fmishing the genome by the 2005 goal were 
uncertain. 

In early 1998, PE Biosystems (now Applied 
Biosystems) developed an automated, high- 
throughput capillary DNA sequencer, subse- 
quendy called the ABI PRISM 3700 DNA 
Analyzer. Discussions between PE Biosystems 
and TIGR scientists resulted in a plan to under- 
take the sequencing of the human genome with 
the 3700 DNA Analyzer and the whole-genome 
shotgun sequencing techniques developed at 
TIGR (23), Many of the principles of operation 
of a genome-sequencing facility were estab^. 
lished in the TIGR facility (24), However, ihe 
facility envisioned for Celera would have a 
capacity roughly 50 times that of TIGR, and 
thus new developments were required for sam- 
ple preparation and tracking and for whole- 
genome assembly. Some argued that the re- 
quired 150-fold scale-up from the H, influenzae 
genome to the human genome with its complex 
repeat sequences was not feasible (25). The 
Drosophila melanogaster genome was thus 
chosen as a test case for whole-genome assem- 
bly on a large and complex- e'ukary otic genome. 
In collaboration with Gerald Rubin and the 
Berkeley Drosophila Genome Project, the nu- 
cleotide sequence bf the 120-Mbp cuchtomatic 
.^-portion of ihc Drosophila genome -\vas*det«r:* 
mined over a 1-year period (26-28). The Dro- 
sophila genome-seqtietocihg effort resulted in 
two key findings: (i) that the assembly algo^ 
rithms could generate chromosome assemblies 
with highly accurate order and orientation with 
substantially less than 1 0-fold coverage, and (ii) 
that undertaking multiple inlerirn assemblies in 
place of one comprehensive final assembly was 
not of value. .• ^ 

These findings, together with the dramatic 
changes in the public genome effort' subsequent 
to the formation of Celera (29), led to a modi- 
fied whole-genome shotgun sequencing ap- 
proach to the human genome. We initially pro- 
posed to do 10-fold sequence coverage of the 
genome over a 3-ycar period and to make in- 
terim assembled sequence data available quar- 
terly. The modifications included a plan to per- 
form random shotgun sequencing to —5-fold 



coverage and to use the unordered and unori- 
ented BAG sequence fiagmeats and subassem- 
blies published in GenBank by the publicly 
funded genome effort (3(Jf) to accelerate the 
project We also abandoned the quarterly an- 
nouncements in the absence of interim assem- ■ 
blics to report 

Although this strategy provided a reason- 

" able result very early that was consistent with a 

. whole-genome shotgun .assembly with eight-, 
fold coverage, the human genome sequence is 
• not as finished as the Drosophila genome was 
with an effective 13-foId coverage. However, it 
became clear that even with this reduced cov; 
erage strategy, Celera could generate an accu- 

' rately ordered and oriented scaffold sequence of 
the human genome in less than 1 year. Human 
genome sequencing was initiated 8 September 
1999 and completed 17 June 2000. The first 
assembly was completed 25 June 2000, and the 
assembly reported here was completed 1 Octo- 
ber 2000. Here we describe the whole-genome 
random shotgun sequencing effort applied to 
the human genome. We developed two differ- 
ent assembly approaches for assembling the '-S 
billion bp that make up the 23 pairs of chromo- 
somes of the Homo sapiens genome. Any Gen- 
Bank-derived data were shredded to remove 
potential bias to the fmal sequence from chi- 

■ meric clones, foreign DNA contamination, or : 
misassembled cpntigs. Insofar as a conrectly 
and accurately assembled genome sequeiice 
with faithful order and orientation of contigs 
is essential for an accurate analysis of the 
human genetic code, we have devoted a con- 
siderable portion of this. manuscript to the 
documentation of the quality of our recon- 
struction of the genome. We also describe our 
preliminary analysis of the human genetic 
code on the basis of computational methods. 
Figure 1 (see fold-out chart associated with 
this issue; files for each chromosome can be 
found in Web fig., 1 on Science Online at 
www.sciencemag.org/cgi/content'full/291/ 
5507/1 304/Dt:l) provides a graphical over- 
view of the genome and the features encoded 
in it. The detailed manual curation and inter- 
pretation of the genome are just beginning. 

To aid the reader in locating specific an- 
alytical sections, we' have divided the paper 
into seven broad Sections. A sunimary of the 
major results appears at the beginning of each 
section. ; - ; - 

1 Sources of DNA and Sequencing Methods 

2 .Genbme Assembly Strategy and 
Characterization 

3 Gene Prediction and Annotation 

4 Genome Stnicture 

5 Genome Evolution 

6 A Genome-Wide Examination of 
Sequence Variations 

7 An Overview of the Predicted Protein- 
Coding Genes in the Human Genome 

8 Conclusions 



1 Sources of DNA and Sequencing 
Methods ^ 

Summajy. This section discusses the raiionaJc 
and ethical rules governing donor sclcciion to 
ensure ethnic and gender diversity along wlOj 
the methodologies for DNA extraction and fj- 
brary construction. The plasmid library coh. 
struction is the' first critical step in shotgun 
sequencing. If the DNA libraries are not uni- 
form in size, nonchimeric, and do not randomly 
represent the genome, then the subsequent stcpj 
cannot accurately reconstmct the genome se- 
quence. We used automated high-throui:ljpui 
DNA sequenciiig and the computational inrra- 
. structure to -enable effipicnt. tracking of cnor- 
inous amounts of sequence information (27.3 
million sequence reads; 14.9 billion bp of se- 
quence). Sequencing and tracking from both 
ends of plasmid clones from 2-, 10-, and 50-kbp 
libraries were essential to the computational 
reconstruction of the genome. Our evidence 
indicates that the accurate pairing rate of end 
sequences was greater than 98%. 

Various policies of the United States and tlie 
World Medical Association, specifically the 
Declaration of Helsinki, offer recommenda- 
tions for conducting experiments with human 
subjects. We convened an Institutional Re- 
view Board (IRB) (3 J) that helped us estab- 
lish the protocol for obtaining and using hu- 
man DNA and the-informed consent process 
used to enroll research volunteers for the 
DNA-sequencing studies reported here. Wc 
adopted several steps and procedures lo pro- 
tect the privacy rights and confidentiality of 
the research subjects (donors). These includ- 
ed a two-stage consent process, a secure ran- 
doni alphanumeric coding system for speci- 
mens and records, circumscribed contact with 
the. subjects by researchers, and options for 
ofT-site contact of donors. In addition, Celera 
applied for and received a Certificate of Con- 
fidentiality from the Department of Hcullh 
and Human Services. This Certificate autho- 
rized Celera to protect the privacy of the 
individuals who volunteered to be donors as 
provided in Section 301(d) of the Public 
Health Service Act 42 U.S.C. 241(d). 

. Celera and the IRB believed that the ini- 
tial-version of a completed human genome 
should be a composite derived from multiple 
donors of diverse ethnic backgrounds Pro- 
spective donors were asked, on a voluntary 
basis, to self-designate an ethnogeographic 
category (e.g., Aftican-Ainerican, Chinese. 
Hispanic, Caucasian, etc.), We enrolled 21 
donors (32). 

Three basic items of information from 
each donor were recorded and linked by con- 
fidential code to the donated sample: age. 
sex, and self-designated ethnogeographic 
group. From females, -130 ml of whole 
heparinized blood was collected. From males, 
-130 ml of whole, heparinized blood wns 
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collected, as well as five specimens of se. 
collected over a 6-week period. Permanent 
lympboblastoid cell lines were created by 
Epstein-Barr virus inunortalization. DNA 
from five subjects was selected for genomic 
DNA sequencing: two males and three^ fe- . 
males— one Afncan-American, oiie Asian- 
Chinese, one Hispanic-Mexican, and two 
Caucasians (see Web fig. 2 on Science Onlme 
at www.sciencemag.org/cgi/content/291/5507/ 
1304/DCl). The decision of .whose. DNA to 
sequence was baised "on a complex mix of fac- 
** tor^, including the goal of achiewig diversity as 
well as technical issues such as the quality of 
the DNA h-braries and availability of immortal- 
ized cell lines. . ' 

1.1 Library construction and 
sequencing 

Central to the whole-genome shotgun sequenc- 
ing process is preparation of high-quality plas- 
mfd libraries in a variety of insert sizes so that 
pairs of sequence reads (mates) are obtamed, 
one read from both ends of each plasmid msert 
High-quality libraries have an equal representa- 
tion of all parts of the genome, a smaU number 
of clones without inserts, and no contamination 
from such sources as the mitochondrial genome 
and Escherichia coli genonuc DNA, DNA from 
each donor was used to construct plasmid bbrar- 
ies in one or more of three size classes: 2 kbp, 10 
kbp. and 50 kbp (Table 1) (J3). 

In designing the DNA-sequencmg pro- 
cess we focused on developing a simple 
system that could be implemented in a robust 
and reproducible manner and momtored ef- 
fectively (Fig. 2) (i^). ^ ^ 
Current sequencing protocols are based on 



THE HUMAN OENO.S^ 

the dideoxy sequencing method (5i). which 
typically yields only 500 to 750 bp of sequence 
i^r reactioa This limitation on read length has 
made monumental gains in throughput a pre- 
requisite for the analysis of large eukaqrotic 
genomes. We accomplished this at the Celera 
facility, which occupies about 30,000 square 
•feet of laboratory space and produces sequence 
data continuously at a iate of 175,000 total 
reads per day. The DNA-sequencing facdi^ is 
.supported by a high-performance computation- 
al facility'(35). V ; , • •. . . , . 

' • The process for DNA sequencmg was tnod- • 
ular by design and automated. Intermodule 
sample backlogs allowed four principal 
modules to operate independently: (i) li- 
brary transformation, plating, and colony 
picking; (ii) *DNA template preparation; 
(iii) dideoxy sequencing reaction set-up 
and purification; and (iv) sequence deter- 
mination with the ABI PRISM 3700 DNA 
Analyzer. Because the inputs and ou^uls 
of each module have been carefully 
matched and sample backlogs are continu- 
ously managed, sequencing has proceeded 
without a single day's interruption since the 
initiation of the Drpsophila project in May 
1999 The ABI 3700 is a fully automated 
capillary array sequencer and as such can 
be operated with a minimal arnount.of 
hands-on time, currently estimated at about 
15 min per day. The capillary system also 
facilitates correct associations of sequenc- 
inc' traces ^vith samples through the ehmi- 
• • naUbh of manual sarnple loading and lane- 
. . tracking errors- associated with sbb^gels^ 
. About 65 production staff were hired and 
trained, and were rolated.on a regular basis 



hrough the four production modules. A 
central laboratory information management 
system (LIMS) tracked all sample plates by 
unique bar code identifiers. The facility was 
supported by a quality control team that per- 
formed raw material and in-process testing 
and a quality assurance group with responsi- 
bilities including document control, valida- 
tion, and auditing of the facility. Critical to 
the success of the scale-up was the validation 
of- all software and instrumentation before 
implementation; and production-scale testing . 
' of any process changes. ■ : 

1.2 Trace.processing 
An automated trace-processing pipeline has 
been developed to process each sequence file 
(37). After quality and vector trinmung. the 
average trimmed sequence length was. 543 
bp, and the sequencing accuracy 5f P?; 
nentially distributed with a mean of 99.5 /o 
and with less than 1 in 1000 reads being less 
than 98% accurate (2(5). Each trimmed se- 
quence was screened for matches to contam- 
inants including sequences of vector alone 
coU genomic DNA, and human mitochondri- 
al DNA The entire read for any sequence 
with a significant match to a contaminant was 
discarded. A total of 713 reads matched £ 
coli genomic DNA and 2114 reads matched 
the human mitochondrial genome. 

1.3 Quality assessment and control 
The importance of the base-pair level ac- 
• curacy of the sequence data increases as the 
size and repetitive nature of the genome to 
be sequenced increases. Each sequence 
read must be placed uniquely in the ge- . 



Table 1. Celera-generatcd data input into assembly. 



No. of sequencing reads 



Fold sequence coverage 
. . (2S-Cb genomej . 



Fold clone coverage 



Insert size* (mean) 
Insert size* (SD) 
% Matest 



Individual 



A 
B 
C 
D 

•..Total 
"a ' 

B 
C 

F 

Total 

A 
B 
C 

F 

Total 
Average 
Average 
Average 



2 kbp 



Number of reads for different insert librafles 
50 kbp' 



•0 

11.756,757 
853.819 
952.523 " 

13,543.099 
0 

2.20 
0.16 
0.18 
.0 
2.54 
0 

256 
0.22 
0.24 
0 

' 3.42 
.1,951 bp 
6:10% 
74.50 



10 kbp 

0 

7.467.755 
881.290 
1.046.815 
, .1.498.607 
10.894.467 
.0 

1.40 
1.17 
0.20 
. 028 
2.04 
0- 
1126 
•133 
1.58 
226 
16.43 

10.800 bp 
8.10% 
80.80 



Total 



Total number of 
base pairs 



2.767.357 
66.930' • 
0 
0 
0 

2.834.287 
0.52 
0.01 . 
0 
6 
0 

0.53 
1*8:39 
0.44 
0 
0 
0 

18.84 



2.767357 
19.271.442 
1.735.109 
1.999338 
1,498.607 
27.271.853 
6.52 
. 3.61 
032 
037 
0.28 
5.1 1 
18.39 
14.67 
1.54 
1.82 
226 
38.68 



1,502.674,851 
10.464,393.006 
942,164.187 
1.085.640.534 
813.743.601 
14.808.616.179 



•Insert she and SO are ealcuUted from assembly of 
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nome, and even a modest .-.error rate can 
reduce the effectiveness of assembly. In 
addition, maintaining the validity of mate- 
pair information is absolutely critical for. 
the algorithms described below. Procedural 
controls were established for maintaining 
the validity of sequence mate-pairs as se- 
quencing reactions, proceeded through the 
process, including strict rules built into the • 
LIMS. The accuracy of sequence data pro- 
duced by the Celera process was validated 
in the course of the Drosophila genome 
project By collecting data for the 
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entire human genome in a single facility, 
we were able to ensure uniform quality 
standards and the cost advantages associat- 
ed with automation, an economy of scale, 
and process consistency. 

2 Genome Assembly Strategy and . . . 
Characterization 

Summary. We describe in. this section the two. 
approaches that we used to assemble the ge- - 
nome. One method involves the computational 
combination of all sequence reads witfi shred- 
ded data from GenBank to generate an iridepen- 
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dent, nonbiased view of the genome. The sec- 
ond approach involves clustering all of the frag- 
ments to a region or chromosome on the basis 
of mapping informatiorL The clustered data 

• were then shredded and subjected to computa- 
tional assembly. Both approaches provided es- 
sentially the same reconstruction of assembled 

. DNA sequence with proper order and orienta- 

. tion. The. second method, provided slightly 
greater sequence coverage (fewer gaps) and 
was the principal sequence used for thei analysis 

.phase. In addition, we document the complete- 
ness and correctness of this assembly process 
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and provide a comparison to the public gei. ; 
sequence, \vhich was reconstructed largely by 
an, independent BAC-by-BAC approach. Our 
assemblies effectively covered the euchromatic 
regions of the human chromosomes. More than 
90% of the genome was in scaffold assemblies 
of 100,000 bp or greater, and 25% of the ge- 
nome was in scaffolds of 10 million bp or 
larger. 

Shotgun sequence assembly is a classic 
example of an inverse problem: given a set* ' 
of reads randomly sampled from a target 
sequence, reconstruct the order and the po- 
sition of those reads in the target. Genome 
assembly algorithms developed for Dro- 
sophila have now been extended to assemble 
the ~25-fold larger human genome. Celera as- 
semblies consist of a set of contigs that are 
ordered and oriented into scaffolds that are then 
mapped to chromosomal locations by using 
known markers. The contigs consist of a col- 
lection of overlapping sequence reads that pro- 
vide a consensus reconstruction for a contigu- 
ous interval of the genome. Mate pairs are a 
central component of the assembly strategy. 
They are used to produce scaffolds in wWch the 
size of gaps between consecutive contigs is 
known with reasonable precision. This is ac- 
complished by observing' that a jpair of reads, 
one of which is in one contig, and' the other of 
which is in another, implies an orientation and 
distance between the two contigs (Fig. 3). Fi- 
nally, our assemblies did not incorporate all 
reads into the final set of reported scaffolds.* 
This set of unincoiporated reads is termed 
"chaff,** and typically consisted of reads from 
within highly repetitive regions, data from other 
organisms introduced through various routes as 
found m many genome projects, and data of 
poor quality or with untrimmed vector. 
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2.1 Assembly data sets 
We used two independent sets of data for our 
assembhes. The first was a random shotgun 
data set of 27.27 miUion reads of average length 
543 bp produced at Celera. This consisted 
largely of mate-pair reads from 16 libraries 
constructed from DNA samples taken from five 
different donors. Libraries with insert sizes of 2, 
10, and 50 kbp were used. By looking at how 
matQ pairs from a library were positioned in 
known sequenced stretches of the genome, we . 
were able to characterize the lange of insert 
' sizes in each library arid determine a mean and 
standard deviation. Table 1 details the number 
of reads, sequenc'ing coverage, and clone .cov-. 
erage achieved by the data set. The clone cov- 
erage is the coverage of the genome in cloned 
DNA, considering the entire insert of each 
clone' that has sequence from both ends. The 
clone- coverage provides a measure of the 
amount of physical DNA coverage of the ge- 
nome. Assuming a genome size of 2.9 Gbp, the 
Celera trimmed sequences gave a 5.1 X covcr- 
aoe of the genome, and clone coverage was 
3!42X, 16.40X. and 18.84X for the 2-, 10-, and 
50-kbp libraries, respectively, for a total of 
38.7X clone coverage. 

The second data set was from the publicly 
funded Human Genome Project (PFP) and is 
primarily derived from BAG clones (50). The 
BAG data input to the assemblies came from a 
download of GenBank on 1 September 2000 
(Table 2) totaling 4443.3 Mbp of sequence. 
The data for each BAG is deposited at one of 
four levels of completion. Phase 0 data are a set, 
\' bf gerierally unassembled sequencing reads 
' from a very light shotgun of the BAG, typically 
less tiian IX. Phase 1 data are unordered as- 
semblies of condgs, which we call BAG contigs 
or bactig'S. Phase 2 data are ordered assembUes 
of bactigs. Phase 3 data are complete BAG 
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equences. In the past 2 years the PFP has 
focused on a product of lower quality and com- 
pleteness, but on a faster time-course, by con- 
centrating on the production of Phase 1 data 
&om a 3X to 4X light-shotgun of each BAG 
clone. 

We screened the bactig sequences for con- 
taminants by using the BLAST algorithm • 
against three data sets: (i) vector sequences 
in Univec core {38), filtered for a 25-bp 
match at 98% sequence identity at the ends 
of the seijuence and a 30-bp match internal 
to the sequence; the nonhuraan portion • 
of the High Thi-oughput Genomic (HTG) 
Seqences division of GenBank (39), fil- 
tered at 200 bp at 98%; and (iii) the non- 
redundant nucleotide sequences from Gen- 
Bank without primate and human virus en- 
tries, filtered at 200 bp at 98%. Whenever 
25 bp or more of vector was found within 
50 bp of the end of a contig. the tip up to 
the matching" vector was excised. Under 
these criteria we removed 2.6 Mbp of pos- 
sible contaminant and vector firom the 
Phase 3 data, 61.0 Mbp from the Phase 1 
and 2 data, and 16.1 Mbp firom the Phase 0 
data (Table 2). This left us with a total of 
4363.7 Mbp of PFP sequence data 20% 
finished, 75% rough-draft (Phase 1 and 2), 
and 5% single sequencing reads (Phase 0). 
An additional 104,018 BAG end-sequence 
mate pairs were also downloaded and m- 
cluded in the data sets for both assembly 
processes {18). 

2.2 Assembly strategies . 
Two different approaches to assembly were 
pursued. The first was. a whole-genome as- . 
sembly process that used Gelera data and the 
PFP data in the form of additional synthetic 
shotgun data, and the second was a compart- 
mentalized assembly process that first parti- 
tioned the Gelera and PFP data into sets 
- - localized to large chromosomal segments and 
then performed ab initio shotgun assembly on 
each set. Figure 4 gives a schematic of the 
overall.process flow. 

For the whole-genome assembly, the Ytr 
data was first disassembled or "shredded" into a 
synthetic shotgun data set of 550-bp reads that 
form a perfect 2X covering of.the bactgs. This 
resulted in 1 6.05 milUon 'Taux" reads that were 
sufficient to cover tfie genome 2.96X because 
of redundancy in the BAC.data set, without, 
incorporating the biases inhcrexit m the PFP^ 
assembly process. The combined data set of 
-4332 miUion reads (8X), and all associated 
mate-pair information, were then subjected to 
our whole-gcnome assembly algonthm to pro- 
duce a reconstruction of the genome. Neither 
the location of a BAG in the genome Eor its 
assembly of bactigs was used in this process. 
Bactigs were shredded mto reads because we 
found strong evidence that 2.13% of them were 
misassemblcd {40), Furthermore, BAG location 
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information was ignored because some BACs 
were not correctly placed on the PFP physical 
map and because we found strong evidence that 

Table 2. GenBank data Input into assembly. 
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at least 22% of the BACs contained sequence 
data that were not part of the given BAG (41), 
possibly as a result of sample-tracking errors 
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Completion phas e sequence 
5 1 and 2 ^ 



Whitehead Institute/ 
MIT Center for 
Genome Research. 
USA 



Washington University, 
USA 



Baylor College of 
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Production Sequencing 
Facility. DOE Joint 
Genome Institute. 
USA 



The Institute of Physical 
and Chemical 
Research (RIKEN). 
Japan 



Sanger Centre, UK 



Others' 



All centers combinedf 



Number of accession records 
Number of contigs 
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Total vector masked (bp) 
Total contaminant masked 

(bp) • . ' 
-.Average contig length (bp) • 
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2.825 
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13.654.482 
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2,127 
1.195.732 
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0 
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1,784,700 485.137 
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875.618 
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.■ .22.644 
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• 0 
0 
0 
0 
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0 
0 
0 
0 

... 0 
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' 57.448 
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118,387 
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25.054' 
374,561 
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32,136 
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9.137 
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(see below). In short, we performed a true, ab 
initio whole-genome assembly in which 
took the expedient of deriving additional se- 
quence coverage, but not mate paiis, assembled 
bacdgs, or genome locality, from some cxlcr- 
nally generated data. 

-In the compartmentalized shotgun asscmbl> 
(CSA), Celera and PFP data were partitioned 
into the largest possible chromosomal segments 
or "components'* that could be determined v,ii>. 

• confidence, and then shotgun assembly was op- 
plied to each partitioned subset wherein ihc 
bactig data were again shredded into faux read: 
to ensure an independent ab initio assembly o' 

•the component By subsetting the data in ihi: 
way, the overall cornputational effort was re- 
duced and the effect of interchromosomal dupli 
cations was ameliorated This also resulted in t 
reconstruction of the genome that was relalivch 
independent of the whole-genome assembly rc 
suits so that the Uvo assemblies could be com 

. pared for consistency. The^quality of the parti 
tioning into components was crucial so ih:\ 
different genome regions were not mixed to 
gether. We constructed components from (i) il» 
longest scaffolds of the sequence from cue" 
BAG and (ii) assembled scaffolds of data uniqu 
to Celera's data set. The BAG assemblies wcr 
obtained by a combining assembler that used th 
. bactigs and the 5X Celera data mapped to thos 

* bactigs as input. This effort was undertaken i' 
an interim step solely because the more accumi 
and complete the scaffold for a given scqucnc 
stretch, the more accurately one can tilc thc> 
scaffolds into contiguous components on 
basis of sequence overlap and mate-pair inio 
mation. We further visually inspected and u 
rated the scaffold tiling of the component t 
further increase its accuracy. For the final Cb. 
assembly, all but the partitioning ^^^^''^ 
and an independent, ab mitio reconstruct on c 
the sequence in each component was obta.m 
hv abolving our whole-genome assembly 

' Kffipartitioned,rek^^^ 
SleThreddeJ faux reads of the partitioned, rc 
cvant bactig data. 

2 3 Whote-genome assembly 
The algorithms used for wh^'^'g^'^f J. 
sembly (WGA) of the human Eenom« wc 
enhancements to those used P^f " 
sequence of the DrosopMla genome repon 
in detail in {28). -^.ij 

The WGA assembler consists of aV^V 
composed of five prin=5P*l ^teges. a 
Overlapper. Uniu-gger. Scaffolds .^^^^^^^^^^ 
Resolver. respectively The Scrccnc ^ 
and marks all microsatell.te repeal w^^, 
than a 6-bp clement, and screenj 

taovm interspersed f^P^'^^.'^SA M"'' 
ing Alu. Line, and ribosomal DNA. iv^ 
relions get searched for overlaps. ^ 
screened regions do not get searched / 
be part ofan overlap that mvolves' / 

matching segments. 
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The Ovcrlapper compares every read 
against every other read in search of complete 
end-to-end overlaps of at least 40 bp and with 
no more than 6% differences in the match. . 
Because all data arc scrupulously vector- 
trimmed, the Overlapper can insrsf on com- 
plete overlap matches. Computing the set of 
all overlaps took roughly 10.000 CPU hours 
with a suite of four-processor Alpha SMPs 
with 4 gigabytes of RAM. .This took 4 to 5 . 
days in elapsed time with 40 such mac^es . 
ojperatiiig in parallel.- ^ ... ' . •* * ; * 

Every overlap computed above is statistic 
cally a 1-in-lO^^ event and thus not a coinci- 
dental event. What makes assembly combi- 
natorially difficult is that while many over- 
laps are actually sampled from overlapping 
regions of the genome, and thus imply that 
the sequence reads should be assembled to- 
gether, even more overlaps are actually from 
two distinct copies of a low-copy repeated 
clement not screened above, thus constituting 
an error if put together. We call the former 
"true overlaps" and the latter "repeat-induced 
overlaps.** The assembler must avoid choos- 
ing repeat-induced overlaps, especially early 
in the process, . _ 

We achieve this objective- in the Unitig- 
ger. We first fmd all assemblies pf.reads that 
appear to be uncontested with respect to all 
other reads. We call the contigs formed from 
these subassemblies unitigs (for uniquely as^' * 
sembled contigs). Formally, these unitigs are 
the uncontested interval subgraphs of .the 
graph of all overlaps (42), Unfortunately; 'al- 
though empirically many of these assemblies 
are correct (and thus involve only true over- 
■ laps), some are in fact collections of reads 
from several copies of a repetitive element 
that have been overcollapsed into a single 
subassembly. However, the overcollapsed 
unitigs are easily identified because their av- 
erage coverage depth is too hi^^to be con- 
sistent with the overall level of sequence 
coverage. We developed a simple statistical 
discriminator that gives 'the logarithm of the 
odds ratio that a unitig is composed of unique 
DNA or of a repeat consisting of two pv more 
copies. The discriminator, set to a siflScicntly 
■•'"stringent threshold,- identifies z subset of the- 
unitigs that we are. certain are correct In 
addition, a second; -less stringent threshold 
identifies a subset of jremaining uidligs very 
likely to be correctly assembled* pf which we 
select those that will consistently scaffold 
(see below), and thus are again almost certain 
to be correct. We call the union of these two 
sets U-unitigs. Empirically, we found from a 
6X simulated shotgun of human chromosome 
22 that we get U-unitigs covering 98% of the 
stretches of unique DNA that .are >2 kbp 
long. We are further able to identify the 
boundary of the start of a repetitive clement 
at the ends of a U-unitig and leverage this so 
that U-unitigs span more than 93% of all 



singly interspersed Alu elements -and other 
100-to 400-bp repetitive segments. 

The result of running the Unitigger was 
thus a set of correctly assembled subcontigs 
covering an estimated 73.6% of the human 
genome. The Scaffolder then proceeded to 
use mate-pair information to link these to- 
gether'into scaffolds. When there are two or 
more mate pairs that imply that a given pair 
of U-unitigs are at a certain distance and 
'.orientation >vith, respect to each 'other, the • 
probability^ bf this , being v.Tong Ms again " 
roughly 1 in 10^°, assuming that mate pairs 
are false less than 2% of the time. Thus, one 
can with high confidence link together all 
U-unitigs that are linked by at least ^vo 2- or 
10-kbp.mate pairs producing intermediate- 
sized scaffolds that are then recursively 
linked together by confirming- SO-kbp mate 
pairs and BAC end sequences. This process 
yielded scaffolds that are on the order of 
megabase pairs in size with gaps between 
their contigs that generally correspond to re- 
petitive elements and occasionally to small 
sequencing gaps. These scaffolds reconstruct 
the majority of the unique sequence within a 
genome. 

For the Drosophila assembly, we engaged 
in a . three-stage repeat resolution strategy 
where each stage .'was progressively: more . 



S i iX Cetera Reads 
aQX mate pairs 



aggressive and thus more likely to make a 
nustake. For the human assembly, we contin- 
ued to use the first "Rocks" substage where 
all unitigs with a good, but not defmitivei 
discriminator score are placed in a scaffold 
gap. This was done with the condition that 
two or more mate pairs with one of their 
reads already in the scaffold unambiguously 
place the unitig in the given gap. We estimate 
. the.probability of inserting a unitig into an 
: incortect gap .with this strategy to be less than ' 
10"' based on k probabilistic analysis ' 
. We revised the ensuing VStones" subsUge 
. of the human assetnbly, making it more like 
the mechanism suggested in our earlier work 
{43). For each gap, every read R that is placed 
in the gap by virtue of its mated pair M being 
in a contig of the scaffold and implying R's 
placement is collected. Celera's mate-pairing 
information is correct more than 99% of the 
time. Thus, almost every, but not all, of the 
reads in the set belong in the gap, and when 
a read does not belong it rarely agrees with 
the remainder of the reads. Therefore, we 
simply assemble this set of reads vfi&m tiie 
. gap, eliminating any reads that conflict with 
the assembly. This operation proved much 
more reliable than the one it replaced for the 
Drosophila assembly; in the assembly of a 
-. simulated shotgun data set of human chromo- 
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some 22, all stones were placed correctly, . 

The final method of resolving gaps is to 
fill them with assembled BAG data that cover 
the gap. We call this external gap •V-alking." 

' We did not include the veiy aggressive "Peb- 
bles" substage described in our Drosophila 
work, which made enough mistakes so as to 
produce repeat reconstructions for long inter- 
spersed elements whose, quality was . only 
99,62% correct We decided that for the hu- 
man genome it was philosophically better not ' 
to introduce a step that was certain to produce 
less than 99.99% accuracy. The cost was a 
somewhat larger number of gaps of some- 
what larger size. 
. At the final stage of the assembly process, 

* and also at several intermediate points, a 
consensus sequence of every contig is pro- 
duced. Our algorithm is driven by fte princi- 
ple of maximum parsimony, with quality- 
value-weighted measures for evaluating each 
base. The net effect is a Bayesian estimate of 
the correct base to report at each position. 
Consensus generation uses Celera data when- 
ever it is present. In' the event that no Celera 
data cover a given region, the BAC data 
sequence is used. 

A key element of achieving a WGA of the 
human genome to parallelize the Overlap- 
per and the central consensus sajaence-con- 
stiucting subroutines. In addition, memory was 
a. real issue— a straightforward "application of 
the software we had built for DrospphUa would 
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have required a computer with a 600-gjgabyte 
RAM. By making the Overlapper and Unitigger 
incremental, we were able to achieve the same 
computadon with a maximum of instantaneous 
usage of 28 ^gabytes of RAM. Moreover, the 
incremental nature of the first three stages al- 
lowed us to continually update the state of this 
part of the computation as data were delivered 
and then perfomi a 7-day run to complete Scaf- 
. folding and Repeat Resolution whenever de- 
*sired. -For our assembly operations, ^the total 
compute infiaslmcture consists of 10 four-pro- 
cessor SMPs with 4 gigabytes of memory per 
cluster (Compaq's ES40, Regatta) and a 16- 
processor NUMA machine with 64 gigabytes 
of memory (Compaq's GS160. Wildfire). The 
total compute for a run of the assembler was 
roughly 20,000 CPU hours. 

The assembly of Celera's data, together 
with the shredded bactig data, produced a set of 
scaffolds totaling 2.848 Gbp in span and con- 
sisting of 2.586 Gbp of sequence. The chaff, or 
set of reads not incorporated in the assembly, 
numbered 1127 million (26%), which is con- 
sistent with our experience for Drosophila. 
. More than 84% of the genome was covered by 
• scaffolds />100 kbp long, and these averaged 
91% sequence and 9% gaps with a total of 
2297 Gbp of sequence. There were a total of 
93,857 gaps among the 1637 scaffolds >100 
kbp. The average scaffold size was 1.5 Mbp, 
the average contig size was 24.06 kbp, arid the. 
. average gap size was 2.43 kbp, where the dis- 



tribution of each was essentially exponential 
More than 50% of all gaps were less than 50C 
bp long, >62% of all gaps were less than 1 kbf 
long, and no gap. was >100 kbp long. Similar- 
ly, more than 65% of the sequence is in contigs 
>30 kbp, more than 31% is in contigs >100 

. kbp, and the largest contig was 1.22 Mbp long. 
Table 3 gives detailed summary statistics for 

• the structure of this assembly with a direct 
comparison to the compartmentalized shotgun 
assembly. 

2.4 Compartmentalized shotgun 
assembly 

In addition to the WGA approach, we pur- 
sued a localized assembly approach that was 
intended to subdivide the genome into seg- 
ments, each of which could be shotgim as- 
sembled individually. We expected that this 
would help in resolution of large interchro- 
mosomal duplications and improve the statis- 
tics for calculating U-umtigs. The compart- 
mentalized assembly process involved clus- 
tering Celera reads and bactigs into large, 
multiple megabase regions of the genome, 
and then pinnin g the WGA assembler on the 
Celera data and shredded, faux reads ob- 
tained from the bactig data. 

The first phase of the CSA strategy was to 
separate Celera reads into those that matched 
the BAC contigs for a particular .PF? BAC 
. entry, and those that did not match any public 
data. Such matches must be guaranteed to 



Table 3. Scaffold statistics for whole-genome and compartmentalized shotgun assemblies.' 



Scaffold size 



AU 



>30 kbp 



>100 kbp 



>500 kbp 



No. of bp In scaffolds 

(including Intrascaffold gaps) 
No. of bp Jn contigs 
No. of scaffolds 
No. contigs - i 
No. of gaps . \i 

No. of gaps^:Sl kbp 
Average scaffold size (bp) 
Average tiptig size (bp) . ^ 
Average Intrascaffold gap size 

(bp) • ^ 
Largest contig (bp) v 
% of toUl contigs 

No. of bp In scaffolds 

(including Intrascaffold gaps) 
No. of bp In contigs 
No. of scaffolds . . 
No. of contigs 
No. of gaps 
No. of gaps :S1 kbp 
Average scaffold size (bp) 
Average contig size (bp) 
Average intrascaffold gap size 

Largest contig (bp; 
% of total contigs 



2,905,568.203 

2.653.979,733 
53.591 
170.033 
116.44Z 
72,091 
54,217 
15.609 
2,161 

1,988,321 
100 

2,847,890390 

2,5^6.634,108 
. J* 118.968 
221.036 
102.068 
62.356 
23,938 
11.702 
2.560 

1.224,073 
100 



Compartmentalized shotgun assembfy 

2,748.892.430 2.7C)0.489,905 



2,524,251.302 
. 2.845 
- . 112.207 
109.362 
69.175 
, 966.219 
22.496 
2.054 

' 1,988,321 
95 



2.491.538,372- 
1.935 
107,199 
• 105.264 
67,289 
1.395.602 
23.242 
1,985 

1,988,321 
94 



.2.574,792'.618*. ... 


2,525.334.447 


2334343,339 


2,297.678.935 


2.507 


1.637 


99,189 


95,494. 


96,682 


93.857 


60.343 


59,156 


1.027,041 


1,542.660 


23.534 


24.061 


2.487 


2.426 


i;224,073 


1.224.073 



90 



89 



2.489,357.260 

2.320.648.201 
1.060 
93.138 
92,078 
59.915 
2,348,450 
24,916 
1.832 

1,988,321 
87 



2328.535.466 

2.143,002.184 
818 
84.641 
* 83.823 
54.079 
2,846.620 
25319 
2.213 

1.224.073 
83 



>1000 kbp 



2,248.689.128 

2.106.521,902 
721 
82.009 
81.288 
' 53354 
3.118.848 
25,686 
1.749 

1,988321 
79 



2.140343.032 

1383,305,432 
554 
76,285 
75.731 
49.592 
3.864.518 
25399 
2.082 

1.224.073 
77 
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properly place a Celera read, so all reads were 
first masked against a library of common 
repetitive elements, and only matches of at 
least 40 bp to unmasked portions .of the read 
constituted a hit. Of Celera's 27 J7 million 
reads 20.76 million matched a bactjg and 
another 0.62 million reads, which did not 
have any matches, were nonetheless identi- 
fied as belonging in the region of the bactig s 
BAG because their mate matched the bactig. 
Of the reniaining reads. 2.92. million were . 
completely screened out and so .cduld not b.^ 
. matched, but the other 2.97 million reads bad • 
unmasked sequence totaling 1.189 Gbp that 
were not found in the GenBank data set . 
Because the Celera data are 5.1 1 X redundant, 
we estimate that 240 Mbp of unique Celera 
sequence is not in the GenBank data set. 

In the next step of the CSA process, a 
combining asseinbler took the relevant 5X 
Celera reads and bactigs for a BAG entry, and 
produced an assembly of the combined data 
for that locale. These hijih-quality sequence 
reconstructions were a transient result whose 
utility was simply to provide more reliable 
information for the purposes of theu- tilmg 
into-sets of overlapping and adjacent scaffold 
sequences in the next step. In outlme. the 
combining assembler first examines the set of 
matching Celera reads to determine- if there 
are excessive pileups indicative of un- 
screened repetitive elements. Wherever these.- 
occur, reads in the repeat region whose mates, 
have not been mapped to consistent positions 
are removed. Then all sets of mate pairs that , 
consistently imply the same relative position 
of two bactigs are bundled into a link and 
weighted according to the number of mates m 
the bundle. A "greedy" strategy then attempts 
to order the bactigs by selecting bundles of 
mate-pairs in order of their weight. A selected 
mate-pair bundle can tie together two fonma- 
tive scaffolds. It is incorporated to form a 
single scaffold only if it is consisteat with the 
majority of links between contigs of the scaf- 
fold Once scaffolding is complete, gaps are 
filled by the "Stones" strategy descnbed 
above for the WGA assembler. 

The GenBank-data for the Phase Jrand 2 
B AGS consisted of an average of 19.8:bacligs 
per BAG of average size 8099 bp. Applica-; 
tion of the combining assembler resulted in 
individual Celera BAG assemblies b^mg put 
together into an average, of 1.83 scaffolds 
(median of 1 scaffold) consisting pf an aver- 
age of 8.57 contigs of average size 18,973 bp. 
In addition to defining order and orientation 
of the sequence fragments, there were 57 /o 
fewer gaps in the combined result For Phise 
0 data, the average GenBank entry consisted 
of 91.52 reads of average length 784 bp. 
Application of the combining assembler re- 
sulted in an average of 54.8 scaffolds consist- 
bg of an average of 58.1 contigs of average 
size 873 bp. Basically, some small amount of 



assembly took place, but not enough CeU^ 
data were matched to truly assemble the 0.5X 
to IX data set represented by the t>pical 
Phase 0 BACs. The combining assembler 
was also applied to the Phase 3 BACs for 
SNP identificarion, confirmation of assem- 
bly and localization of the Celera reads The 
phase 0 dau surest that a combined whole- 
genome shotgun data Kt and IX light-shot- 
eun of BACs will not yield good assembly ot 
BAG regions; at least 3X light-shptgun of 
• each BAG is needed. 

• The 5 89 rnillion Celera firagments not 
matching the GenBank data were assembled 
. with our whole-genome assembler. The as- 
sembly resulted in a set of scaffolds totaling 
: 442 Mbp in span and consistmg of 326 Mbp 
of sequence. More than 20% of the sc^olds 
were >5 kbp long, and these averaged 63 /o 
sequence and 27% gaps .with a total of 302 
Mbp of sequence. All sciffolds >5 kbp were 
forwarded along with all scaffolds produced 
by the combining assembler to the subse- 
quent tiling phase. 

At this stage, we typically had one or tw o 

• scaffolds for every BAG region constituting 
at least 95% of the relevant sequence, and a 
collection of disjoint Celera-unique scaffolds. 
The next step in developing the genome com- 
ponents was to determine the order and over- 
lap tiling of these BAG and Gelera-umque 
scaffolds across the genome. For this, Ave 
"used Celera's 50-kbp mate-pairs information 
"and BAG-end pairs m sequence ugged 
site (SfS) markers i44) to provide long- 
' raiige guidance and chromosome separation^ 
Given the relatively manageable number of 
scaffolds, we chose not to produce this bUng 
in a fiilly automated manner, but to compute.^ 
an initial tiling with a good heuristic and then 
use human curators to resolve discrepancies 
or missed join opportunities To this end. we 
developed a graphical user mterface that dis- 
played the graph of tiling overlaps and the 
•evidence for each. A human curator could 
then explore the implication of mapped STS 
data, dot-plots of sequence overlap, and a 
visual display of the ^nate-pair evidence flip- 
..porting a'^giXeo..choic^..The result of 
process was a collection of "components 
where each component 'was a ^»«<i s«t of 
Sac and Celera-unique -scaffolds that had 
been curator-approyed. The process resulted 
in 3845 components wi.di an estimated span 

°"ln'o'rd«'to ge^nerate the f-al-CS^ - 
assembled each component with the WGA 
SgoSthm. As was done in the WGA process, 
the bactig data were shredded mto a synthetic 
2X shotgun data set in order to^give the 
asseiiibler the freedom to independently as- 
semble the data. By using faux reads rather 
San bactigs, the assembly algonthin codd 
correct errors in the assembly of bactigs and 
remove chimedc content in a PFP data entry. 



C jcfic or contaminating sequence (from 
another part of the genome) would not be 
incorporated into the reassembly of the com- 
ponent because it did not belong there. In 
effect, the previous steps in the CSA process 
served only to bring together Celera frag- 
ments and PFP data relevant to a large con- 
tiguous segment of the genome, wherem we 
applied the assembler used for WGA to pro- ■ 
duce an ab initio, assembly of the region. 

WGA assembly of the components result- . 
•ed in a set of scaffolds toUling 2;906. Gbp in ,;• 
'span and consisting of 2-654 Gbp of se- . 
quence. The chaff, or set of reads not incor- 
porated into the assembly, numbered 6.17 . 
million, or 22%. More than 90.0%. of the 
genome was covered by scaffolds spanmng 
>100 kbp long, and these averaged 92.2/o 
sequence and 7.8% gaps with a total of 2.492 
Gbp of sequence. There were a total of 
105,264 gaps among the 107,199 contigs that 
belong to the 1940 scaffolds spannmg >100 
kbp. The average scaffold size was 1.4 Mbp 
the average contig size was 23.24 kbp. and 
the average gap size was 2.0 kbp where each 
distriburion of sizes was exponential. As 
such, averages tend to be undetrepresentahve 
of the majority of the data. Figure 5 shows a 
histogram of the bases in scaffolds of vanous 
size ranges. Consider also that more than 
49% of all gaps were <500 bp long, more 
than 62% of all gaps were <1 kbp. and all 
gaps are <I00 kbp long. Similarly, more than 
73% of the sequence is in contigs > 30 kbp, 
more than 49% is in contigs >100. kbp. fid 
the largest contig was' 1 .99 Mbp long. Table 3 
provides summary statistics for the structure 
of this assembly with a direct companson to 
the WGA assembly. 



2.5 Comparison of the WGA and CSA 
scaffolds 

Having obtained two assemblies of the hu- 
maii genome via independent computational 
processes (WGA and CSA). we compared 
scaffolds fi-om the two assemblies as another 
means of investigating their completeness, 
consistency; and contiguity. From each 
sembly, a set of reference scaffolds contam- 
ing at least 1000 fragments (Celera sequenc- 

ini reads or bactig sl"'^)^^^?'*'"^',?;', 
amounted to 2218 WGA scaffolds and 1717 . 
CSA scaffolds, for a total of 2,087 Gbp and 
• 2 474 Gbp. The sequence of each reference 
scaffold was compared to the sequence of aU 
scaffolds from the other assembly with which 
it shared at least 20 fragments or at le^t 20/o 
of the fragments of the S'na^^^^^^ff^lf.f^' 
• each such comparison, all matches of at least 
200 bp with at most 2% nusmatch were 

tabulated. . 

From this tabulation, we estunaled Oie 
kmount of unique sequence in each assembly 
in two ways. The first was to detetmmc the 
n JTer o/bases of each assembly that were 
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not covered by a matching segment in the 
other assembly. Some 82.5 Mbp of the WGA 
(3.95%) was not covered by^he CSA, where- 
as 204.5 Mbp (8.26%) of the CSA was not 
covered by the WGA. This estimate did not 
require any consistency of the assemblies or 
any imiqueness of the matching segments. • 
Thus, another analysis was conducted in. 
which matches of less than 1 kbp between a 
pair of scaffolds were excluded imless they 
were confirmed by other matches . having a : 
consistent order and orientation. This gives 
' some measure of consistent coverage: 1.982 
Gbp (95.00%) of the WGA is covered by the 
CSA, and 2.169 Gbp (87.69%) of the CSA is 
covered by the WGA by this more stringent 
measure, r/ " " 

The comparison of WGA to CSA also 
permitted evaluation of scaffolds for structur- 
al inconsistencies. We looked for instances in 
which a large section of a scaffold from one 
assembly matched only one scaffold from the 
other assembly, but failed to match over the 
full length of the overlap implied by the 
matching segments. An.initial set of candi- 
dates was identified automatically, and then 
each candidate was inspected by hand. From 
this process, we identified 31 instances in 
which the assemblies appear to disagree in a 
nonlocal fashion. These cases are bekig fur- 
ther evaluated to detemiine which assembly " 
is in error and why. . ' 

In addition/ we evaluated local inconsis- 
tencies of orderor orientation. The following 
results exclude cases in which one contig in 
one assembly corresponds to more than one 
overlapping contig in the other assembly (as 
long as the order and orientation of the latter 
agrees with the positions they match in the 
former). Most of these small rearrangements 
involved segments on thT order of hundreds 
of base pairs and rarely >1 kbp. We found a 
total of 295 kbp (0.012%) in the CSA assem- 
blies that were locally inconsistent with the 
WGA assemblies, whereas 2.108 Mbp 
(0jl%)'ln the WGA assembly were incon- 
sistent V/ith the CSA assembly. , ' 



50% -I 

45% 
I 40% 

1 35% 

5 30% - 

*§ 25% • 

% 20% 

g 15% - 

S 10% 
6. 

5% -I 



0% 



THE HUMAN GENOME 

The CSA assembly was a few percentage 
points better in terms of coverage and slightly 
more consistent than the WGA, because it 
was in effect perfomiing a few thousand shot- 
gun assemblies of megabase-sized problems, 
whereas the WGA is perfomiing a shotgun 
assembly of a gigabase-sized problem. When 

• one considers the increase of two-and-a-half 
orders of magnitude in problem size, the in- 
formation loss between the two is remarkably 
small. Because CSA was logistically easier to 

' deliver and the better of the two results avail- - 
able at the time when downstream analyses 
needed to be begun, all subsequent analysis 
was performed on this assembly. ' ' 

2.6 Mapping scaffolds to the genome 
The final step in assembling the genome was to 
order and orient the scaffolds on the chromo- 
somes. We first grouped scaffolds together on 
the basis of their order in the components from 
CSA. These grouped scaffolds were reordered 
by examining residual mate-pairing data be- 
tween the scaffolds. We next mapped the scaf- 
fold groups onto the chromosome using physi- 
cal mapping data. This step depends on having 
reliable high-resolution map information such 
that each scaffold will overiap multiple mark- 
ers. There are two genome-wide types of 'map 
information available: high-density STS maps- 

• and fingerprint maps of BAG clones developed 
at Washington University (4S). Among the ge- 

' nbme-wide STS maps, GeneMap99 (GM99) 
has the most markers and therefore was most 
\iseful for mapping'scaffolds. The Uvo different 
mapping approaches arc complementary to one 
another. The fingerprint maps should have bet- 
ter local order because they were built by com- 
parison of ovedapping BAG clones. On the 
other hand, GM99 should have a more reliable 
long-range order, because the framework mark- 
ers were derived from well-validated genetic 
maps. Both types of maps were used as a' 
reference for, human curation of the compo- 
nents that w^rc the input to the regional assem- 
bly,- but* they ;did'not detemiine the order of 
sequences produced by the assembler. 



1-5 Mb 5-10 Mb >10Mb 



<30kb 30-50 kb 50-100 kb 100-500 kb 0.5-1 Mb 

Scaffold Size 

Hg. 5. Distribution of scaffold sizes of the CSA. For each rangeofs«ffotd sixes, the percent 
sequence Is Indicated. 



In order to determine the effectiveness of 
the fmgerprint maps and GM99 for mapping 
scaffolds, we first examined the reliability of 
these maps by comparison with large scaf- 
folds. Only 1% of the STS markers on the 10 
largest scaffolds (those >9 Mbp) were 
mapped on a different chromosome on 

. GM99. Two percent of the STS markers dis- 
agreed in position by more than five frame- 
work bins.. However, for the fingerprint 

. maps, a 2% chromosome discrepancy was 
observed, and on average 23.8% of BAC . 
locations in the scaffold -sequence disagreed 
with fingerprint map placement by more than 
five BACs. When further examining the ' 
source of discrepancy, it was found that most 
of the discrepancy came from 4 of the 10. 

. scaffolds, indicating this there is variation in 
the quality of either the map or the scaffolds. 
All four scaffolds were assembled, as well as 
the other six, as judged by clone coverage 
analysis, and showed the same low discrep- 
ancy rate to GM99, and thus we concluded 
that the fingerprint map global order in these 
cases was not reliable. Smaller scaffolds had 
a higher discordance rate with GM99 (4.21% 
of STSs were discordant by more than five 
framework bins), but a lower discordance rate 
with the fmgerprint maps (11% of BACs 
disagreed with fmgerprint maps by more than 
five BACs). This observation agrees w;ith the 
clone coverage analysis (46) that Celera scaf- 
fold construction was better supported by 
long-range mate pairs in larger scaffolds than 
in small scaffolds. 

We created two orderings of Celera scaf- 
folds on the basis of the markers (BAC or 
STS) on these maps. Where the order of 
scaffolds agreed between GM99 and the 
WashU BAC map, we had a high degree of 
confidence that that order was correct; these 
scaffolds were termed "anchor scaffolds." 

' Only scaffolds with a low overall discrepancy 
rate with both maps were considered anchor 
scaffolds. Scaffolds in GM99 bins were al- 
lowed to permute in their order to match 
WashU ordering, provided they did not vio- 
late their framework orders. Orientation of 
individual scaffolds was detennined by tfie 
presence , of multiple 'mapped markers with 
consistent order. Scaffolds with " only one 
marker have insufficient infonmation to as- 
sign orientation. We found 70.1% of the ge- 
nom'e in anchored scaffolds, more than 99% 
of which are also oriented (Table 4). Because 
GM99 is oflower resolution than the WashU 

map, a number of scaffolds without STS 
matches could be ordered relative to the an- 
chored scaffolds because they included se- 
quence from the same or adjacent BACs on 
the WashU map. On the other hand, because 
of occasional WashU global ordering dis- 
crepancies, a number of scaffolds determined 
to be **unmappable- on the WashU map could 
be ordered relative to the anchored scaffoias 
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wth GM99. These scaffolds were termed 
-ordered scaffolds." We found that 13.9% of 
the assembly could be ordered by these ad- 
ditional methods, and thus 84.0% of the ge- 
nome was ordered unambiguously. 

Next, all scaffolds that could be placed, 
but not ordered, between anchors.were as- 
signed to the interval between the anchored 
scaffolds and were deemed to be "bound- 
ed" between them. For example, small scaf- 
folds having STS hits from the same Gene- 
Map bin or hitting the sarne B AC cannot be 
ordered relative to' each other, 'but, cari be*. 
assigned a placement boundary relative to 
other anchored or ordered scaffolds. The 
remaining scaffolds either had no localiza- 
tion information, conflicting information, 
or could only be assigned to a generic 
chromosome location. Using the above ap- 
proaches, —98% of the genome wias an- 
chored, ordered, or bounded. 

Finally, we assigned a location for each 
scaffold placed on the chrorhosome by 
spreading out the scaffolds per chromosome. 
We assumed that the remaining urunapped 
scaffolds, constituting 2% of the genome, 
were distributed evenly across the genome. 
By dividbg the sum of uiunappcd. scaffold 
lengths with the sum of the' number of 
mapped scaffolds, we arrived at an estimate 
of interscaffold gap of 1483 bp. This gap was 
used to separate all the scaffolds on each 
chromosome and to assign an offset in the ^ 
chromosome. 

During the scaffold-mapping effort, we en- 
countered many problems that resulted in addi- 
tional quality assessment and validation analy- 
sis. At least 978 (3% of 33,173) BACs were 
believed to have sequence data from more than 
one location in the genome (^7). This is con- 
sistent with the bactig chimerism analysis re- 
ported above in the Assembly Strategies sec- 
tion. These BACs could not be assigned to 
unique positions within the CS A assembly and 
thus could not be used for ordering scaffolds. 
Likewise, it was not always possible to assign 
STSs to unique locations in the assembly be- 
cause of genome duplications, repetitive ele- 
ments, and pscudogciJes. \' 

Because of the* time .required for-an' ex- 
haustive search for a perfect- overlap,' CSA 
generated 21,607 intta^csfffold gaps where 
the inate-pab data suggested that the contigs 
should overlap, but no overlap was found. 
These gaps were defmed as a fixed'SO bp in 
length and make up 18.6% of the total 
1 1 6,442 gaps in the CSA assembly. 

We chose not to vise the order of exons 
implied in cDNA or -EST data as a way of 
ordering scaffolds. The rationale, for not us- 
bg this data was that doing so would have 
biased certain regions of the assembly by 
rearranging scaffolds to fit the transcript data 
and made validation of both the assembly and 
gene defmitlon processes more difficult. 
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^.7 Assembly and validation analysis 
We analyzed the assembly of the genome 

* from the perspectives of completeness 
(amount of coverage of the genome) and . 
correctness (the structural accuracy of the 
order and orientation and the consensus se- 
quence of the;assembly). 

Complete/iess. Completeness b defmed as 
the percentage of the euchromalic sequence 

• represented in the assembly. This carmot be . 
knovfn with absolute certainty tmtil the eu- • 
^chromatin.-, sequence has been completed. ; 

However, it is possible to estimate complete- . 
ness on the basis of (i) the estimated sizes of 
intrascaffold gaps;' (ii) coverage of the two ^ 
published chromosomes, 21 and 22 {48, 49)\ 
and (iii) analysis of the percentage. of an 
independent set of random sequences (STS 
markers) contained in .the assembly. The . 
whole-genome libraries contain heterochro- 
matic sequence and, although no attempt has 
been made to assemble it, there may be in- 
stances of uiiique sequence embedded in re- 
gions of heterochromatin as were observed in 

Drosophila (50, 51), 

The sequences of human chromosomes 21 

and 22 have been" completed to high quality 
and published {48, 49\ Although this se- 
quence served as input to the assembler, the 
finished sequence was shredded into a shot- 
gun data set so that the assembler had the 
- opportunity to assemble it differently from 
\ the original sequence in the case of structural • 
..polymorphisms -or assembly errors in the * 
BAC data.. In particular, the assembler must 
be able to resolve repetitive elements at the 
■ scale of components (generally multimega- 
base in size), and so this conlparison reveals 
the level to %Vhich the assembler resolves 
repeats. In certain* areas, the. assembly struc- 
ture differs from the published versions of 
chromosomes 21 and 22 (see below). The 
consequence of the flexibility to assemble 
•Tmished'* sequence differently on the basis 
of Celera data resulted in an assembly with 
more segments than the chromosome 21 and 
22 sequences. We cxarnined the reasons why 
there are more gaps in the Celera sequence 
than in chromoSTofnes 21'ianci 22 and expect 
that they may be typical of gaps in other 
'regions of the genome. In the Celera assem- 
blyi there are 25 scaffolds, each containing at 
least 10 kb of sequence; that collectively span 
. 94.3% of chromosome 21. Sixty-two scaf- 
folds span 95.7% of chromosome 22. The 
total length of the gaps remaining 'in the 
Celera assembly for these nvo chrpmqspmes 
15 3;4"Mbp. These gap sequences were ana- 
lyzed by RepeatMasker and by searching 
against-the entire genome assembly (52). 
About 50% of the gap sequence consisted of 
common repetitive elements identified by Re- 
peatMasker; more than half of the remainder 
was lower copy number repeat elements. • • 
A more global way of assessing complete- 



ness IS to measure the content of an independent 
set of sequence data in the assembly. We com- 
pared 48,938 STS markers from Genemap99 
(51) to the scaffolds. Because these markers 
were not used in the assembly processes, they 
prodded a truly independent measure of com- 
pleteness. e?CR (53) .and BLAST (54) were 
used to locate STSs on the assembled genome. 
We found 44,524 (91%) of the STSs in the 
mapped genome. An additional 2648 markers 
(5.4%) -were found by .searching the. uhas-. 
sembied data or **chaff.''.We identified 1283 
STS markers (2.6%) not found in eiOier Celera . 
sequence or BAC data as of Septernber 2000, 
raising the possibility that these markers may 
not be of human origin. If that were the case, 
the Celera assembled sequence would represent 
93.4% of the human genome and the unas- 
sembled data 5.5%, for a total of 98.9% cover- 
age. Similarly, we compared CSA against 
36,678 TNG radiation hybrid markers (55a) 
using the same method We found that 32,371 
markers (88%) were located in the mapped . 
CSA scaffolds, with 2055 markers (5.6%) 
found in the remainder. This gave a 94% cov- 
erage of the genome through another genofhe- 
wide survey. 

Correctness, Correctness is defmed as the 
structural and sequence accuracy of the as- 
sembly. Because the source sequences for the 
-Celera data and the GenBank data are from 
different individuals, we could not directly 
compare the consensus sequence of the as- 

•Table 4. Summary of iscaffold mapping. Scaffolds 
v/ere mapped to the genome v/ith different levels 

. of confidence (anchored scaffolds have the highest 
conHdencc; unmapped scaffolds have the lowest). 

-Anchored scaffolds v/ere consistently ordered by 
the WashU BAC map and CM99. Ordered scaf- 
folds were consistently ordered by at least one of 
the following: the WashU BAC map, GM99. or 
component tiling path. Bounded scaffolds had or- 
der conflicts between at least two of the external 
maps, but their placements were adjacent to a 
neighboring anchored or ordered scaffold. Un- 
mapped scaffolds had. at most, a chromosome 
assignment. The scaffold subcategories are given 
below each ca'tegory. 
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length 


Anchored 


1.526 


1.860.576.676 


70 


Oriented 


1.246 


1.852,088.645 


70 


Unorlented • • 


280 


: 8.588.031 


0.3 


Ordere'd 


2.001 


369.235.857 


14 


Oriented 


839 


329,633.166 


12 


Unoriented 


1.162 


. 39.602,691 


2 


Bounded 


38.241 


368.753.463^ 


14 


Oriented 


7.453 


274.536.424 


10 


Unoriented 


30.788 


94.217,039 


4 


Unmapped 
Known 


11.823 
281 


55.313.737 
2.505.844 


2 
0.1 


chromosome 








Unknown 


11.542 


52.807,893 


2 


chromosome 
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sembly against other finished sequence for 
detennining sequencing accuracy at the nu- 
cleotide level, although tliis has been done for 
identifying polymoiphisms as described in . 
Section 6. The accuracy of the consensus 
sequence is at least 99.96% on the basis of a 
statistical estimate derived from the quality 
values of the underlying reads. 

The structural consistency of the assembly 

. can be measured by mate-pair analysis. In a 
correct assembly, every mated pair of se- 

' quencing reads should be located on the con- 
sensus sequence with the correct separation . 

• and orientation between the pairs. A pair is 
termed 'Valid'* when* the reads are. in* the . 
correct orientation, and the* distance between 
them is within the mean ± 3 standard devi-. 
ations of the distribution of insert sizes of the 
library from which the pair was sampled. A • 
pair is termed "misorienled" when the reads 
are not correctly oriented, and is tenoied "nus- 
separated" when the distance between the 
reads is not in the correct range but the reads 
are correctly oriented. The mean ± the stan- 
dard deviation of each library used by the 
assembler was deterauned • as described 

. above. To validate these, we examined all 
reads mapped to the -finished sequence of : 
chromosome 21 {48) and determined how 
many incorrect mate pairs there were as a 
result of laboratory tracking errors and chi- - 
merism (two different segments of the ge- 1 
nome cloned into the same plasmid), and how : 
tight the distribution of insert sizes was for 
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those that were correct (Table 5). The stan- 
dard deviations for all Celera libraries were 
quite small, less than 15% of the insert 
length, with the exception of a few 50-kbp 
libiaiies. The 2- and 10-kbp Hbraries con- 
tained less than 2% invalid mate pairs, where- 
as the 50-kbp libraries were somewhat higher 
; (—10%). -Thus, although the mate-pair infor- 
mation was not perfect, its accuracy was such 
that, measuring valid, misoriented, and mis- 
separated pairs with respect to a given assern- 
bly was . deemed to be a reliable instrument 
.for validation purposes, especially when sev- 
eral mate pwrs confirm or deny an ordering. 

The clone coverage of the genome was 
39X, meaniiig that any given base.pair was,; 
on average, contained in 39 clones or, equiv- 
alently, spanned by 39 mate-paired reads. 
Areas of low clone coverage or areas with a 
high proportion of invalid mate pairs would 
indicate potential assembly problems. We 
computed the coverage of each base in the 
assembly by valid mate pairs (Table 6). In . 
summary, for scaffolds >30 kbp in length, 
less than 1% of the Celera assembly was in 
regions of less than 3 X clone coverage. Thus, 
more than 99% of the assembly, including 
order and orientation, is strongly supported 
by this measure alone. 

We examined the locations and number of 
.all misoriented and misseparated mates. In 
addition to doing this analysis on the CS A 
assembly (as of 1 October 2000), we also: 
performed a study of the PFP assembly as of 



5 September 2000 (30, 55b). In this latter 
case, Celera mate pairs had to be mapped to 
the PFP assembly. To avoid mapping errors 
due to high-fidelity repeats, the only pairs 
mapped were those for which both reads 
matched at only one location with less than 
6% differences. A threshold was set such that 
sets of five .or more simultaneously invalid 
mate pairs iadicated a potential breakpoint, 
where the construction of the two assemblies 
differed. The graphic comparison of the CSA 
chromosome 21 assembly with the published 
sequence (Fig. 6A) serves as a validation of 
this methodology. Blue tick marks in the 
panels indicate breakpoints. There were a. 

'Similar (small) number of breakpoints on 
. both chromosome sequences. The exception 
was 12 sets of scaffolds in the Celera assem- 
bly (a total of 3% of the chromosome length 
in 212 sLngle-contig scaffolds) that were 
mapped to the wrong positions because they 
were too small to be mapped reliably. Figures 

6 and 7 and Table 6 illustrate the mate-pair 
differences and breakpoints between the Uvo 
assemblies. There was a higher percentage of 
misoriented and misseparated male pairs in 
the large-insert libraries (50 kbp and BAC 
ends) than in the small-insert libraries in both 
assemblies (Table 6). The large-insert librar- 
ies are more likely to identify discrepancies 
simply because they span a larger segment of 
the genome. -The. graphic . comparison be- 
tween the two assemblies for chromosome 8 
(Fig. 6, B and C) shows that there are many 



Table 5. Mate-pair validation. Celera fragment sequences were mapped to- 
the published sequence of chromosome 21. Each mate pair uniquely 
mapped was evaluated for correct orientation and placement (number 



of mate pairs tested). If the two mates had Incorrect relative orienta- 
tion or placement, they were considered Invalid (number of Invalid mate 
pairs). 



Libraiy 
type 



2 kbp 
id kbp 

50 kbp 



BES 



Sum 



Library 
no. 



1 

.,3 

5 
6 
7 
8 
9 
10 

n 

12 
13 
14 
15 
16 
17 
18 
19 



Mean 
Insert 
size 
(bp) 



2.0^1 
i;913 
2,166 
11.385 
14,5^3 
9.635 
16:223 
64,888 
53,410 
52,034 
52,282 
46,616 
55,788 
39,894 
48.931 
48,130 
106.027 
160.575 
164,155 



SD 
(bp) 



-106 
• 152 
175 
851 
1.875 
1,035 
928 
2.747 
5.834 
7312 
7.454 
7378 
10.099 
5,019 
9,813 
4.232 
27.778 
54,973 
19.453 



Chromosome 21 



Genome 



50/ 
mean 



No. of 

mate 
-pairs 
.-•tested 



5.1 
7:9 
8.1 
7.5 
12.9 
10.7 
9.1 
4.2 
10.9 
14.1 
14.3 
15.8 
18.1 
12.6 
20.1 
8.8 
26.2 
34.2 
11.9 



3.642 
28.029 
4.405 
•4,319 
' 7.355 
5.573 
34.079 
16 
914 
5.871 
2.629 
2.153 
2.244 
199 
* 144 
195 
330 
155 
642 
102.894 



No. of 
Invalid 
mate 
pairs 



38 
413 

57 

80 
156 
109 
399 
1 

- .-"T70 
• 569 
213 
215 
249 
7 
10 
14 
16' 
8 
44 
2.768 
(mean » 2.7) 



invalid 



1.0 
1.5 

.1.3 
1.9 
2.1 
2.0 
1.2 
6.3 

18.6 
9.7 
8.1 

10.0 

11.1 
3.5 

6.9 
7.2 
4.8 
5.2 
63 
2.7 



Mean 
Insert 
size (bp) 



2.082 
1.923 
2.162 
11.370 
14.142 
■ 9.606 
10,190 
65.500 
53.311 
51,498 
52,282 
45,418 
53,062 
36.838 
47.845 
47.924 
152.000 
161,750 
176.500 



SD 
(bp) 



90 
118 
158 
696 
- 1.402 
934 
777 
5304 
5,546 
6,588 
7.454 
9.068 ' 
10,893 
9.988 
4.774 
4,581 
26,600 
27,000 
19.500 



SD/ 
mean 
[%) 



4.3 
6.1 
7.3 

6.1 
9.9 
9.7 
7.6 
8.4 
10.4 
12.8 
14.3 
20.0 
20.5 
27.1 
10.0 
9.6 
17.5 
16.7 
11.05 
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gene boundaries. During this process, multiple 
hits to the same region were collapsed to a 
coherent set of data by tracking the coverage of 
a region- For example, if a group of bases was 
represented by multiple overlapping ESTs, the 
union of these regions matched by the set of 
ESTs on the scaffold was marked as being 
supported by EST evidence. This resulted in a 
series of "gene bins/' each of which was be- 
lieved to contain a single gene. One weakness of 
this initial implementation of the algorithm was 
in predicting gene boundaries in regions of tan- 
demly duplicated genes. Gene clusters frequent- 
ly resulted in homologous neighboring genes 



being joined together, resulting in an annotanon 
that artificially concatenated these gene models. 

Next, known genes (those with exact match- 
es of a full-length cDNA sequence to the ge- 
nome) were identified, and the region corre- 
spondbg to the cDN.A. was annotated as a 
predicted transcript. A subset of the curat- 
ed human gene set RefSeq from the Nation- 
al Center for Biotechnology Infomiation 
(NCBI) was included as a data set searched in 
the computatiohai pipeline. If a RefSeq tran- 
script matched the genome assembly for at least 
50% of its length at >92% identity, then the 
SIM4 (63) alignment of the RefSeq transcript to 



the region of the genome imder analysis was 
promoted to the status of an Otto annotation. 
Because the genome sequence has gaps and 
sequence errors such as fVameshifts, it was not 
always possible to predict a transcript that 
agrees precisely with the experimentally deter- 
mined cDNA sequence. A total of 6538 genes 
in our inventory were identified and transcripts 
predicted in this way. 

Regions that have a substantial ampurit of. 
sequence similarity, but do not match known 
genes, were analyzed by that part of the Otto 
system that uses the sequence similarity in- 
formation to predict a transcript. Here, Otto 
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Fig. 6. Comparison of the CSA and the PFP assembly. 
(A) All of chromosome 21, (B) all of chromosome 8, 
and (C) a 1-Mb region of chromosome 8 representing 
a single Cclera scaffold. To generate the figure, Celera 
fragment sequences were mapped onto each assem^ 
bly. The PRP^assembly is indicated in the upper third 
of each panel; the Celera assembly-is indicated in the 
lower third. In iKe center of the panel; green lines 
show Celera seqyenc'es that are in the same order and 
orientation in both assemblies and form the longest 
consistently ordered run of sequences. Yellow lines 
Indicate sequence blocks that are in the same orien- 
tation, but out of order. Red lines indicate sequence 
blocks that are not in the same orientation. For 
clarity, in the latter tv/o cases, lines are only drawn 
between segments of matching sequence that are at 
least 50 kbp long. The top and bottom thirds of each 
panel show the extent of Celera mate-pair violations 
(red, misoriented; yellov/, .incorrect distance between 
the mates) for each assembly grouped by library size. 
(Mate pairs that are v/ithin the correct distance, as 
expected from the mean library Insert size, are omit- 
ted from the figure for clarity.) Predicted breakpoints, 
corresponding to stacks of violated mate pairs of the 
same type, are shov/n as blue ticks on each assembly 
axis. Runs of more than 10.000 Ns are shown as cyan 
bars. Plots of all 24 chromosomes can be seen in Web 
fig. 3 on Science Online at vvww.sciencemag.org/cgi/ 
content/full/29V5507/1304/DC1. 
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evaluates evidence generated by the compu- 
tational pipeline, corresponding to conserva- . 
tion between mouse and human genomic 
DNA, similarity to human transcripts (ESTs 



THE HUMAN GENOME 

and cDNAs), similarity to rodent transcripts 
(ESTs and cDNAs), and similarity of the 
translation of human genomic DNA to known 
proteins to predict potential genes in the hu- 



man genome. The sequence from the region 
of genomic DNA contained in a gene bin was 
extracted, and the subsequences supported by 
any homology evidence were marked (plus lOd 
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Fig. 7. Schematic view of the distribution of breakpoints and large gaps 
on all chromosomes. For each chromosome, the upper pair of lines 
represent the PFP assembly, and the lower pair of lines represent Celera s 



assembly. Blue tick marks represent breakpoints, whereas red tick marks 
represent a gap of larger than 10,000 bp. The number of breakpoints per 
chromosome Is Indicated In black, and the chromosome numbers In red. 
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bases flanking these regions). The other bases 
in the region, those not covered by any homol- 
ogy evidence, were replaced-by N's. This se- 
quence segment, with high confidence regions 
represented by the consensus genomic se- 
quence and the remainder represented by N's, 
was then evaluated by Genscan to see if a 
consistent gene model could be generated. This 
procedure simplified the gene-prediction task . 
. by first establishing the boundary for the gene 
(not a strength of most gene-finding algo- 
rithms), and by eliminating re^ons with no 

• supporting evidence. If Genscan returned a 
plausible gene model, it was further evaluated 
before being promoted to an "Otto" annotation. 
The final Genscan predicdons were oSen quite 
different -from the prediction that Genscan re- 
turned on the same region of nadve genomic 
sequence. A weakness of using Genscan to 
refine the gene model is the loss of valid, small 
exons from the final annotation. 

The next step in defining gene structures 
based on sequence shnilarity was to compare 
each predicted transcript with the homology- 
based evidence that was wed in previous steps 
to evaluate the depth of evidence for each exon 
in the prediction. Intemarexons were consid- 
ered to be supported if they were covered by 
homology evidence to within ±10 bases of 
their edges. For first and last exons; the internal 
edge was required to be wthin 1 0 bases, but the • 
external edge was allowed greater latitude to 
allow for 5' and -3' untranslated regions 
QJTRs). To be retained, a prediction for a 
multi-exon gene must have evidence such that 

• the total number of "hits " as defined above, 
divided by the number of exons in the predic- 
tion must be >0.66 or must correspond to a 
RefSeq sequence. A single-exon gene must be 
covered by at least three supporting hits (±10 
bases on each side), and* these must cover the 
complete predicted open reading frame. For 
a single-exon gene; we also required that 
the Genscan prediction include both a start 
and a stop codon. Gene models that did not 
meet these criteria were «iiSfegardc"d," and 

Table 7. Sensttivity'and specificity of Otto and 
Genscan. Seniitivlty and specificity were calculat- 
ed by first aligning the predictlonto the published 
RefSeq transcript, tallying the number (A/) of 

• uniquely aligned. RefSeq bases: Sensitivity is the 
ratio of N to the length of the published RefSeq 
transcript Specificity k the ratio of N to the 
length of the prediction. All differences are signif; 
leant (Tukey HSD; P < 0.001). 



Method 



-Sensitivity Specifidtjr • 



Otto (RefSeq only)* 0.939 0.973 
Otto (homology)! 0.604 0.884 

Genscan 050^ 0.633 



^'Refers to those annotations pro<Juced by Otto using only 
the Sini4-poUshed RefSeq atignnnent rather than an evi- 
dence-based Genscan prediction. tRe^e« ^^^^ 
annotations produced by supplying all available evidence 
to Genscaa 
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those that passed were promoted to Otto 
predictions. Homology-based Otto predic- 
tions do not contain 3' and 5' untranslated 
sequence. Although three de novo gene-finding 
programs [GRAIL, Genscan, and FgenesH 
(63)] were nin as part of the computational" 
analysis, the results of these programs were not 
directly used in maldng the Otto predictions. 
. Otto, predicted li;226 additional genes by 
means of sequence similarity. • 

3.2 Otto validation 

To validate the Otto homology-based process 
.and the method that Otto uses to define the 
structures of known genes, we compared tran- 
scripts predicted by Otto with then" conrespond- 
ing (and presumably correct) transcript from a 
set of 4512 RefSeq transcripts for which there 
was a unique SIM4 alignment (Table 7). In 
order to evaluate the relative performance of 
Otto and Genscan, we made three comparisons. 
The first involved a detemiination of the accu- 
racy of gene models predicted by Otto with 
only homology data other than the correspond- 
ing RefSeq sequence (Otto homology in Table 
7). We measured the sensitivity (correctly pre- 
' dieted bases divided by the total length of the 
cDNA) and specificity (correctly predicted 
bases divided by the sum of the correctly and 
incorrectly predicted bases). Second, we exam- 
ined the sensitivity and specificity of the Otto' 
. predictions that were made solely with the Ref- 
Seq sequence, which is the process that Otto . 
uses to annotate known genes (Otto-RefSeq). 
And third, we dctemuned the accuracy of the 
Genscan predictions corresponding' to these 
RefSeq sequences. As expected, the alignment 
method (Otto-RefSeq) was the most accurate, 
and Otto-homoloey performed better than Gen- 
scan by both criteria. Thus, 6. 1 % of true RefSeq • 
nucleotides were nqt represented in the Otto- 
refseq annotations and 2.7% of the nucleotides 
in the Otto-RefSeq transcripts were not con- ' 
taiiied in the original RefSeq transcripts. The 
discrepancies could come from legitimate 
differfent'es- between the Celera assembly 
and the RefSeq transcript due to polymor- 
phisms, incomplete or incorrect data in the 
Celera assembly, errors introduced by Sim4 
during the ialignment process, or the pres- 
ence of alternatively spliced forms in the 
data set-nised for the comparisons. 

Because Otto uses an *e\adence-based zp- 
proach to reconstruct ^eiies;,the absence of 
experimental evidence for intervening exons 
rnay inadvertantly result ill a set of exons that 
cannot be spliced together to give rise to a 
transcript In such cases, Otto may "split genes" 
when in fact all the evidence should be com- 
bined into a single teanscript We also examined 
the tendency of these methods to incorrectly 
split gene predictions. These trends are shown 
in Fig. 8. Both RefSeq and homology-based 
predictions by Otto split known genes into few- 
er segments than Genscan alone. 



3.3 Gene number 

Recognizing that the Otto system is quite 
conservative, we used a different gene-pre- 
dictioii strategy in regions where the ho- 
mology evidence was less strong. Here the 
results of de novo gene, predictions were 
used. For these genes, we insisted that a 
predicted transcript have at least two of the 
following types of evidence to be included 
in the gene set for further analysis: protein, 
human EST, rodent EST, or mouse genome 
fragnient matches. This final class of pre- 
dicted genes is a subset of the predictions 
made by the three gene-fmding programs 
that were used in the computational pipe- 
line. For these, there, was not sufficient 

• sequence similarity information for Otto to 
attempt to predict a gene structure. The 
three de novo gene-finding programs re- 
sulted in about 155,695 predictions, of 
which r-76,410 were nonredundant (non- 
overlapping with one another). Of these, 
57,935 did not overlap known genes or 
predictions made by Otto. Only 21,350 of 
the gene predictions that did not overlap 
Otto predictions were partially supported 
by at least one type of sequence similarity 
evidence, and 8619 were partially support- 
ed by two types of evidence (Table 8). 

The sum of this number (21,350) and the 
number of Otto annotations (17,764), 39,1 14, 
is near the upper limit for the human gene 
complement, As seen in Table 8, if the re- 
quirement for other. supporting evidence is 
made more stringent, this number drops rap- 
idly so that demanding two types of evidence 
reduces the total gene number to 26,383 and 
demanding three types reduces it to '--23,000. 
Requiring that a prediction be supported by 
all four categories of evidence is too stringent 
because it would eliminate genes that encode 
novel proteins (members of currently \mde- 
'scribed protein families). No correction for 
pseudogenes has been made at this point in 
the analysis. 

In a further attempt to identify genes that 
were not found by the autoannotation process 
or any of the de novo gene finders, we ex- 
amined regions outside of gene predictions 
that were similar to the EST sequence, and. 
where the EST matched the genomic se- 
quence across a splice junction. After correct- 
ing for potential 3' UTRs of predicted genes, 
about 2500 such regions remained. Addition 
of a requirenrient for at least one of the fol-. 
lowing evidence/types— homology to mouse 
genomic sequence ifragments, rodent ESTs, 
or cDNAs— or similarity to a known protein 
reduced this number to 1010. Adding this to 
the numbers from the previous paragraph 
would give us estimates of about 40,000, 
27,000, and 24,000 potential genes in the 

• human genome, depending on the stringency 
of evidence considered. Table 8 illustrates the 
number of genes and presents the degree of 
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confideiice based on the supporting evidence. 
.Transcripts encoded by a set of 26,383 genes 
were assembled for fiirther analysis. This set . 
includes the 6538 genes predicted by Otto on 
the basis of matches to known "genes, 1 1,226 
transcripts predicted by Otto based on homol- 
ogy evidence, . and 8619 from the subset of 
transcripts from de novo gene-prediction pro- 
grams that have two types of supporting ev- 
idence. The 26,383. genes are illustrated along 
.chromosome diagrams in Fig; :1. These ai-e a 
•-very preliminary set of annotations arid are 
subject to all the limitations of an automated 
process. Considerable refmement is still nec- 
essary to improve the accuracy of these tran- . 
script predictions. All the predictions and. 
descriptions of genes and the associated evi- 
dence that we present are the product of 
completely computational processes, not ex- 
pert curation. We have attempted to enumer- 
ate the genes in the human genome in such a 
way that we have different levels of confi- 
dence based on the amount of supporting 
evidence: known genes, genes with good pro- 
tein or EST homology evidence, and de novo 
gene predictions confirmed by modest ho- 
mology evidence. 

3.4 Features of human gene 
transcripts 

We estimate the average span for a "typf- 
cal" gene in the human DNA sequence to 
be about 27,894 bases. This is based on.the 
average span covered by • RefSeq tf an- - 
scripts, used because it represents our high- 
est confidence set. 

The set of transcripts promoted to gene 
annotations varies in a number of ways. As 
can be seen from Table 8 and Fig. 9, tran- 
scripts predicted by Otto tend to be longer, 
having on average abput 7.8 exons, whereas 
those promoted from gene-prediction pro- 
grams average about 3.7 exons. The largest 
number of exons that we have identified in a 
transcript is 234 in the titin mRNA. Table 8 
compares the amounts of evidence that sup- 



port the Otto and other predicted transcripts. 
For example, one can see that a typical Otto 
transcript has 6.99 of its 7.8 1 exons supported 
by protein homology evidence. As would be. 
expected, the Otto transcripts generally have * 
more support than do transcripts predicted by 
the de novo methods. 

4 Genome Structure 

Si(mmary, This section describes several of ' 
.the honcoding attributes of - the assembled ; 
genome sequence and their correlations with 
the predicted gene set. These include an anal- 
ysis of G4-C content and gene density in the 
context of cytogenetic maps of the genome, 
an enumerative analysis of CpG islands, and 
a brief description of the genome-wide repet- 
itive elements. 
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4.1 Cytogenetic maps 

Perhaps the most obvious, and certainly the 

• most, visible, element of the structure of 
. the genome is the .banding pattern produced 

by Giemsa stain. Chromosomal banding 
studies have revealed that about 17% to 
20% of the human chromosome comple- 
ment consists of C-bands, or constitutive 
helerochroniatin;(5^). Much of this hetero- 
. chromatin is highly polymorphic and con- ' 
. sists oif different families'of alpha satellite 
DNAs with various higher order repeat 

• Structures (<J5). Many chromosomes have 
complex inter- and intrachromosomal du- 
.plications present in pericentromeric re-, 
gions (66), About 5% of the sequence reads 
were identified as alpha satellite sequences; 
these were not included in the assembly. 



B Otto (homology) 

□ Otto (RefSeq only) 

□ Genscan . 
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Fig. 8. Analysis of split genes resulting from different annotation methods. A set of 4512 
Sim4-based alignments of RefSeq transcripts to the genomic assembly were chosen (see the text 
for criteria), and the numbers of overlapping Genscan. Otto (RefSeq only) annotations based solely 
on Sim4-polished RefSeq alignments, and Otto (homology) annotations (annotati9ns produced by 
supplying all available evidence to Genscan) were tallied. These data .show the degree to which 
multiple Genscan predictions and/or Otto annotations were associated ' with* a single RefSeq 
transcript. The zero class for the Otto-homology predictions shown here Indicates that the 
Otto-homology calls were made without recourse to the Re/Seq transcript, and thus no Otto call 
was made because of insufficient evidence. 



Table 8, Numbers of exons and transcripts supported by various types of evidence for Otto and de novo gene prediction methods. Highlighted cells Indicate 
* the gene sets analyzed In this paper (boldface set of genes selected for protein' analysis; Italic, total set of accepted de novo predictions). - . 





• • • • 


Total 




■ Types of evidence 






No. of lines of evidence* 








Mouse 


Rodent 


Protein 


• Human 


&1 


S:2 


^3 




Otto 


Number of 


17,969 


17,065 


14.881 


15,477 


. * 16,374 


1 7.968 1 


17.501 


15,877 


12,451 




transcripts 
Number of 


141,218 


111.174 


89,569 


108,431' 


- ''118,869 


140.710 


127.955 


99,574 


59,804 


De novo 


exons 
Number of 


58,032 


14,463 


5,094 


8,043 


9,220 


27.350 


8.619 


4,947 


1,904 




/ transcripts . 
Number'of 


319.935 


48,594 


19,344 


26,264 


. 40,104 


79.148 . 


31.130 


17,508 


6,520 


No. of exons per 
transcript 


exons 
Otto 
De novo 


7-84 
. 5,53 


5.77 
3.17 


6.01 

3.80 


6.99 
3.27 


7.24 
4.36 


7.81 
3.7 


7.19 
3.55 


6.00 
3,42 


4.28 
3.16 



•four kinds of evidence (conaen^allon In 3X mouse genomic DNA, similarity to human EST or cDNA, similarity to rodent EST or cDNA^ and similarity to known proteins) were 
considered to support gene predictions from the different methods. The use of evidence Is quite liberal, requiring only a partial match lo a single exon of predicted transcript i'Ms 
number Indudes alternative splice forms of the 17,764 genes mentioned elsewhere In the text 
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Examination of periccntromeric regions is 
ongoing. 

The remaining ^^80% of tbe genome, the 
euchromatic component, is divisible into G-, 
and T-bands (67). These cytogenetic bands 
have been presumed to differ in their nucleotide 
composition and gene density, although we 
have been unable to detennine precise band 
boimdaries at the molecular level. T-bands are 
the most G+C- and gene-rich, and G-bands are 
,G+C-poor (6S). Bemardi has also offered a 
description of the euchromatin at the molecular 
level as long stretches of DNA of differing base 
composition, teraied isochores (denoted L, HI, 
H2, and H3), which are >300 kbp .in length 
(69). Bemardi defined the L (light) isochores as . 
G+Opoor (<43%), whereas the H (heavy) 
isochores fall into three G+C-rich classes rep- 
resenting 24, 8, and 5% of the genome. Gene . 
. concentration has been claimed to be very low 
in the L isochores and 20-fold more enriched in 
the H2 and H3 isochores (70), By examining 
contiguous 50-kbp windows of G+C content 
across the assembly, we found that regions of 
G+C content >48% (H3 isochores) averaged 
273.9 kbp in length, those with G+C content 
between 43 and 48% (HI +H2 isbchores) aver- 
aged 202.8 kbp in length, and the average span 
of regions with <43% (L isochores)^ was 
,1078.6 kbp. The correlation between \G+C . . 
content and gene density, was also examined in . 
50-kbp windows along the assembled sequence . ,• 
(Table 9 and Figs. 10 and 11). We found that 
the density of genes was greater in regions of 
high G+C than in regions of low G+C content, 
as expected. However, the conrelation between 
G+C content and gene density was not as 
skewed as previously predicted (69). A higher 
proportion of genes were located in the G+C- 
poor regions than had been expected. 

Chromosomes 17, 19, and 22, which have 
a disproportionate number of H3-containing 
bands, had tiie highest-gene density (Table 
10). Conversely, of the chromosomes that we 



found to have the lowest gene density, X, 4, 
18, 13, and Y, also have the fewest H3 bands. 
Chromosome 15, which also has few H3 
bands, did not have a particularly low gene 
density in our analysis. In addition, chromo- 
some 8, which we found to have a low gene 
density, does not appear to be unusual in its 
H3 banding. 
. .How. valid -is .Ohno*s postulate (7J) that 
•mammalian genomes consist of oases of genes 
: in otherwise essentially empty deserts? It ap- 
. pears that the hurhan genome does indeed con- 
. .tain deserts, or large, gene-poor regions. If we 
define a desert as a region >500 kbp without a 
. gene, then we see that 605 Mbp; or about 20% 
•.of the. genome, is in' deserts. These are not 
uniformly distributed over the various chromo- 
somes. Gene-rich chromosomes 17, 19, and 22 
have only about 12% of their collective 171 
Mbp in deserts, whereas gene-poor chromo- 
somes 4, 13, 18, and X have 27,5% of their 492 
Mbp in deserts (Table 1 1). The apparent lack of 
predicted genes in these regions does not nec- 
essarily imply that they are devoid of biological 
function. 

4.2 Linkage map . 

Linlcage maps provide the basis for genetic 
: analysis and are widely used in the study of the 
inheritance of traits and in the positional clon- . 
in'g of genes. The distance metric, centimorgans 
(cM)/is based on the recornbination rate be- . 
.tween homologous chromosomes during meio- 

Table 9., Characteristics of C+C in Isochores: 



sis. In general, the rate of recombination in 
females is greater than that in males, and this 
degree of map expansion is not uniform across 
the genome (72). One of the opportum'ties en- 
abled by a nearly complete genome sequence is 
. to produce the ultimate physical map, and to 
fully analyze its correspondence with two other 
maps that have been widely .used in genome 
. and genetic analysis; the lirfcage map and the 
cytogenetic map. This would close the loop 
between the mapping and sequencing phases of 
the genome project 

We. mapped the location of the markers 
that constitute the Genethon linkage map to 
the genome. The rate of recombination, ex- 
pressed as cM per .Mbp, was calculated for 
3-Mbp windows as shown in Table 12. High- 
er rates of recombination in the telomeric 
region of the chromosomes have been previ- 
ously, documented (73). From this mapping 
result, there is a difference of 4.99 between 
lowest rates and highest rates and the largest 
difference of 4.4 between males and females 
(4.99 to 0.47 on chromosome 16). This indi- 
cates that the variability in recombination 
rates among regions of ihc genome exceeds 
the differences in recombination rates be- 
t>A'een males and females. The human ge- 
nome has recombination hotspots, where re- 
:. combination rates vary fivefold or more over 
a space of 1 kbp, so the picture one gets of the 
riiagnitude of variability in recombination • 
rate will depend* -on the size of the window 



Isochore 


C+C (%) 


Fraction of 


genome 


Fraction of genes 


Predicted* 


. Observed 


Predicted* 


Observed 


H3 


>48 


5 


9.5 


37 


24.8 


H1/H2 


43-48 


25 


21.2 


32 


26.6 


L 


<43 


67 


* ' 69.2 


31 


48.5 



*The predictions wece, based on Bemardi's definiUons (70) of the kocfiore structure of the human genome. 



^ Fig. 9. Comparison of 
the number of exons 
per transaipt between, 
the 17.968 Otto ti^ 
scripts and 21350 de 
novo transcript predic- 
tions with at [east one 
line of evidence that 
do not overlap with an 
Otto prediction. Both . 
sets have the highest 
number of transalpts 
In the two-exon cate- 
goiy, but the de novo 
gene predictions are 
skewed much more 
toward smaller tran- 
scnpts. In the Otto set. 
19.7% of the tran- 
scripts have one or 
two exons, and 5.7% 





7ooor: 




6000 - 


M 
«•-« 
r> 


5poq - 


iscrl; 


4000 • 


. of trar 


3000 - 


o 

2. 


2000 - 
1000- 




@ No. of otto 
transcripts 

fH No. of de novo + 
1 line of evidence 



;i.^.lli.R..RL.FL.R .n 



Fl m n n 



8 9 10 11 12 13 14 15 16" 17 18 19 20 >20 
Number of exons per transcript 
have more than 20. In the de novo set. 493% of the transcripts have one or two exons. and 0.2% have more than 20. 



(22 



16 FEBRUARY 2001 VOL 291 SCIENCE wwwj5ciencemag.org 



examined. Unfortunately, too few meiotlc 
crossovers have occurred in Centre d 'Etude 
du Polymorphism Humain (CEPH) and other 
reference families to provide a resolution any 
fmer than about 3 Mbp, The next challenge 
will be to determine a sequerfte basis of 
recombination at the chromosomal level. An 
accurate predictor for the rate for variation in 
recombination rates between any pair of 
markers would be extremely useful in design- 
ing markers to narrow a region of linkage, 
'such as in positional cloning projects.. 

4.3 Correlation between CpG Islands 
and genes 

I CpG islands are stretches Of unmethylated 
DNA with a higher frequency of CpG 
dinucleotides when compared with the entire 
genome (74), CpG islands are believed to 
preferentially occur at the transcriptional start 
of genes, and it has been observed that most 
housekeeping genes have CpG islands at the 
5' end of the transcript (75, 76). In addition, 
experimental evidence indicates that CpG is- 
land methylation is correlated with gene in- 
activation (77) and has been shown to be 
important during gene imprinting (78) and 
tissue-specific gene expression (79) 

Experimental methods' have been used 
that resulted in an estimate of 30,000 to 
45,000 CpG islands in the human genome 
(74, 80) and an estimate of 499 CpG islands' 
on. human chromosome 22 (81). Larsen et' ■■• 
aL (76) and Gardirier-Garden and Frommer ' 
(75) used a computational method to iden- 
tify CpG islands and defined them as re- 
gions of DNA of >200 bp that have a G+C 
content of >50% and a ratio of observed 
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versus expected frequency of CG di nucle- 
otide ^0.6. 

It is difficult to make a direct compari- 
.. son of experimental definitions of CpG is- 
lands with computational definitions be- 
cause computational methods do not con- 
sider the methylation state of cytosine and 
experimental methods do not directly select 
regions of high G+C content. However, we 
can determine the correlation of CpG island . 
/ with gene.,starts, given a set of annotated ^ 
. genomic transcripts arid the whole genome 
sequence. We have analyzed the publicly 
available annotation of chromosome 22, as . 
well as using the entire human genome in 
our assembly and the computationally an- 
notated genes. A variation of the CpG is- 
land .computation was compared with 
Larsen et at, (76). The main differences are 
that we use a sliding window of 200 bp, 
consecutive windows are merged only if 
they overlap, and we recompute the CpG 
value upon merging, thus rejecting any po- 
tential island if it scores less than the 
threshold. 

To compute various CpG statistics, we 
used two different thresholds of CG dlnucle- 
otide likelihood ratio. Besides using the orig- 
inal threshold of 0.6 (method 1), we used a 
higher threshold of CG dinucleotide likeli- 
hood ratio of 0.8 (method 2), which results in 
the number of CpG islands on chromosome ■■ 
22 close to the number of annotated genes oh 
this xhromosome. The main results* are sum- 
marized in Table 13. CpG islands. computed 
with method 1- predicted only 2.6% of the :; 
CSA sequence as CpG, but 40% of the gene 
starts '(start codons) are contained inside. a * 
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Fig. 10. Relation between G+C content and gene density. The blue bars show the percent of the 
genome (in 50-kbp windows) with the indicated G+C content The percent of the total number of 
genes associated with each G+C bin Is represented by the yellow bars. The graph shows that about 
5% of the genome has a G+C content of between 50 and 55%, but that this portion contains 
nearly 15% of the genes. 



CpG island. This is comparable to ratios re- 
ported by others (82). The last two rows of 
the table show the observed and expected 
average distance," respectively, of the closest 
CpG island from the first exon. The observed 
average closest CpG islands are smaller than 
the corresponding expected distances, con- 
firming an association between CpG island 
and the first exon. . \ . 

.We also lpoked atthe distribution of CpG 
*• island nucleotide's' among- various sequence : 
classes such as intergenic regions, introns, 
exons, and first exons. We computed the 
likelihood score for each sequence class as 
the ratio of the observed fraction of CpG 
island nucleotides in that sequence class 
and the expected fraction of CpG island 
nucleotides in that sequence class. The re- 
sult of applying method 1 on CSA were 
scores of 0.89 for intergenic region, 1.2 for 
inlron, 5.86 for exon, and 13.2 for first 
exon. The same trend was also found for 
chromosome 22 and after the application of 
a higher threshold (method 2) on both data 
sets. In sum, genome-wide analj^sis has 
extended earlier analysis and suggests a 
strong correlation between CpG islands and 
first coding exons. 

4.4 Genome-wide repetitive elements 

The proportion of the genome covered by 
various classes of repetitive DNA is present- 
ed in Table 14. We observed about 35% of 
. . the genome in these repeat classes, very sim- 
ilar to values reported previously (55). Repet- 
.itive sequence , may *be underrepresented in 
the Celera assembly as a result of incomplete 
• repeat resolution, as discussed above. About 
8% of the scaffold length is in gaps, and we 
expect that much of this is repetitive se- 
quence. Chromosome 19 has the highest re- 
peat density (57%), as well as the highest 
gene density (Table 10). Of interest, among 
the different classes of repeat elements, we 
observe a clear association of Alu elements 
and gene' density, which was not observed 
between LINEs and gene density. 

5 Genome Evolution 
Summary, The dynamic nature of genome 
evolution can be captured at several levels. 
These include gene duplications, mediated by . 
RNA iniennediates (retrotransposition) and . 
segmental genomic duplications. In this sec- 
tion, we document the genome-wide occur- 
rence of retrotransposition events generating 
functional (intronless paralogs) or inactive 
genes (pseudogenes). Genes involved in 
translational processes and nuclear regulation 
account for nearly 50% of all intronless para- 
logs and processed pseudogenes detected in 
our survey. We have also cataloged the extent 
of segmental genomic duplication and pro- 
vide evidence for 1077 duplicated blocks 
covering 3522 distinct genes. 
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Fl£. 11. Genome structural features. 
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5.1 Retrotransposltion in the human 
genome 

Retrotransposltion of processed mRNA 
transcripts into the genome results in func- 
tional genes, called intronless paralogs, or 
inactivated genes (pseudogenes). A paralog 
refers to a gene that appears in more than 
one copy in a given organism as a result of 



a duplication event. The existenee of-both 
intron-containing and intronless forms of 
genes , encoding functionally similar or 
identic^r proteins has been previously de- 
scribed.. (5^, 8S). Cataloging these evolu- 
tionary events on the genomic landscape is 
of value in understanding the functional 
consequences of such gene-duplicattdh 



events in cellular biology. Identification of 
conserved intronless paralogs in the mouse 
or other mammalian genomes should pro- 
vide the basis for capturing the evolution- 
ary chronology of these transposition 
events and provide insights into gene loss 
and accretion in the mammalian radiation. 
A set of proteins corresponding to all 901 



www.sdencemag.org SCIENCE VOL 291 16 FEBRUARY 2001 



1325 



X 



' o „ 
CI- cj 

ex. tJO 



CJ o 



f s 

S g 
^ ST 



o w 

CL CJ 

IJ 



■o 

9 

£ 1^ 
e §: 

CL fa 

^ *^ 
o c 

C CD 
CI l/t 

°£ 

§ cv 

O O 

sl 
Ji 

O 4-t 
C fO 

E 

E CL 

11 



S-S?: 



o o> 
S "o 

O + 



O 



i «= c CJ T 
cr — 



^ A 2: 



Ol 

10 



C CJ p 



CJ o sf 



THE HUMAN GENOME 
r;r<-coiDi^«eo»£>cois*rNjoinoor\jin«)m.-coh-.r«-in <n 

»^ «^ r" ^ i~ fM t~ ^— 

^ g P J2 P P 53 *o o m cn cn o <n «» m m t- ^ 

m rvi m fvi fNj m m rvi m fvi m f\i m m m ^^ r^i V m m m rsi m 

CO f-..r^ AO s. fs, ^^ K as 1^ CO co.o> CP K in 00 CI «^ 1^. I 

coinuirfinoin^oiAooK^h-r^OfM'^focoiO'^iom 

^ fM t— 



e)ojpicoincngiiooc3cn«n }d o »- m «> 00 r- cn o 03 rvi men 



^. 5. C. ?. ^ ^. S 5 g S S S § a ^ ? S S S S! I K ^ I 



CO 



o o 

l§ 



S t2 



si & 



o 



10 m 

CO 



jr^ r~ ^ T~ ^ ' ^ ^ 

• * * . CNJ 



rj t- f- f- r- ^ »- r-r-^-r^ r-" r-* rj r-* CO i-^ 



or ■ - » ' 



I ^ ? - *i .is S 



o 3 o »j? :g 
,3 •^•^'^ V 



g o. 



^c»oo|nooocooicoooo9coo>o>GOor^oitoeioSr<to 
ftft^C?Sit2SSHJj::*^*^'^'~^o«^«-'McocorMCMfnrd 



9 
C 



r- N m to VI N. CO <n o r- fg m in K 09 cn o r- fM • o I? m 

•~ r- 1- fM fM rxj X >- O I- < I • 



16 FEBRUARY 2001 VOL 291 SCIENCE www3ciencemag.org 



Otto-prcdicted, singlc-exon genes were sub- 
jected to BLAST analysis against the proteins 
encoded by the remaining muUiexon predict- 
ed transcripts. Using homology criteria of 
70% sequence identity over 90% of the 
length, we identified 298 instances "of single- 
to multi-exon correspondence. Of these 298 
sequences, 97 were represented in the Gen- 
Bank data set of experimentally validated 
fulWength genes at the stringency specified 
and were verified by manual inspection. ; . 
. : AVe believe, that these .97' cases may rep-; 
resent intronless paralogs (see Web table 1 on 
•Science Online at www.sciencemag.org/cgi/ - 
content/full/291/5507/1304/DCl) of known 
genes. Most of these are flanked by direct 
repeat sequences, although the precise nature 
of these repeats remains to be determined. All ' 
of the cases for which we have high confi- 
dence contain polyadenylated [poly(A)] tails 
characteristic of retrotransposition. 

Recent publications describing the. phe- 
nomenon of functional intronless paralogs 
speculate that retrotransposition may serve as 
a mechanism used to escape X-chromosomal 
inactivation (W, 86). We do not fmd a bias 
toward X chromosome origination of these 
retrotransposed genes; rather, -the results 
show a random chromosome distribution of 
both the intron-containing and corresponding 
intronless paralogs. We also have found sev- 
eral cases of retrotransposition from a single' 
^ soiu-ce chromosome to multiple target chro- 
mosomes. Interesting examples include the- 
retro transposition of a five exon-cpntainlng 
ribpsomal protein L21 gene on chromosome 
13 onto chromosomes 1, 3, 4, 7, 10, and 14, 
respectively. The size of the source genes can 
also show variability. Tlie largest example is 
the 31-cxon diacylglycerol kinase zeta gene' 
on chromosome 11 that has an intronless 
paralog on chromosome 13. Regardless of 
route, retrotransposition with subsequent 
gene changes in coding or nonco'Stng regions 
that lead to different functions or expression 
patterns, represents a key 'route to providing 
an enhanced functional repertoire in mam- 
mals (57). 

Our preliminary 'set of retrotransposed in- 
tronless paralogs* contains a clear oycrreprer . • 
sentatiori of genes involved in translational 
processes (40% ribosprnal' proteins and. 10% 
translation elongation factors) and 'nuclear 
regulation (HMG nonhistone proteins, 4%), 
as well as metabolic and regulatory enzymes. 
EST matches specific to a subset of intronless 
paralogs suggest expression of these intron- 
less paralogs. Differences in the upstream 
regulatory sequences between the source 
genes and their intronless paralogs could ac- 
count for differences in tissue.-specific gene 
expression. Defining which, if any, of these 
processed genes are functionally expressed 
and translated will require further elucidation 
and experimental validation. 
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5.2 Pseudogenes . . 

A pseudogene is a nonfunctional copy that is 
vciy similar to a nomial gene but that has 
been altered slightly so that it is not ex- 



Table 11. Genome overview. 



pressed. We developed a method for the pre- 
liminary analysis of processed pseudogenes 
in the human genome as a starting point in 
elucidating the ongoing evolutionary forces 



Size oF the genocne (including gaps) 
Size of the genome (excluding gaps) 

Longest/contig * .* ' *• • ; **. .. ■ . • - '^ 

Longest scaffold ' •. *. ; .. . • *•:*':• 

Percent of A+T Ih the jgenome ■ '•* ** • * . ; 

Percent of G+C In the genome 
Percent of undetermined bases in the genome 
MostCC-richSOkb 
Least CC-rich 50 kb 
•Percent of genome classified as repeats 
Number of annotated genes 
Percent of annotated genes with unkno^vn function 
Number of genes (hypothetical and annotated) 
Percent of hypothetical and annotated genes with unknown function 
Gene with the most exons 
Average gene size 
Most gene-rich chromosome 
Least gene-rich chromosomes 

Total size of gene deserts (>500 kb with no annotated genes) 
Percent of base pairs spanned by genes 
Percent of base pairs spanned by exons 
Percent of base pairs spanned by Introns 
Percent of base pairs In intergenic DNA 

Chromosome with highest proportion of DNA in annotated exons 
Chrornosome with lowest proportion of DNA in annotated exons 
Longest intergenic region (between annotated + hypothetical genes) 
Rate of SNP variation 

'•In these ranges, the percentagis correspond to the annotated gene set (26. 383 genes) and the hypothetical + 
'annotated gene set (39,11 4 genes), respectively. 

Table 12. Rate of recombination per physical distance (cM/Mb) across the genome.' Genethon markers 
were placed on CSA-mapped assemblies/ and then relative physical distances and rates were calculated 
In 3-Mb windows for each chromosorne. NA, not appUcable. 
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Titin (234 exons) 
27 kbp 
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Chr. 13 (5 genes/Mb). 
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605 Mbp ^ 
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1.1 to 1.4* 

24.4 to 36.4* N 

74.5 to 63.6* 
Chr. 19 (9.33) . 
Chr. Y (036) 

Chr. 13 (3,038.416 bp) 
1/1250 bp 
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0.48 


1.67 
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0.76 
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0.49 


12 


4.12 • ' 


0.76 


0.26 
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0.49 


2,93 
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0.59 


13 
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0.75 


0.01 
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0.17 
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0.62 


3.14 


1.63 
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NA 


NA 


NA 


NA 


NA 
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NA 


NA 


NA 
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NA 


NA 


NA 


NA 
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0.17 


439 


1.55 


032 
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. that account for gene inactivatidn. The gen- - 
eral structural chaiacten§tics of fhese pro- 
cessed pseudogenes include the complete 
lack of intervening sequences found in the. 
functional counterparts, a poly(A) tract at the 
3' end, and direct repeats flanking the pseu- 
dogene sequence. Processed pseudogenes pc- 
ciir as a result of fetrotransposition, whereas * 
unprocessed pseudogenes arise from segmen- . 
tal genome duplication. . • 

We searched the complete" set of Otto-, 
•predicted transcripts against the genomic se- 
quence by means of BLAST. Genomic re- 
gions corresponding to all Otto-predicted ., 
transcripts were excluded from this analysis. 
We identified 2909 regions matching with - - 
greater than 70% identity over at least 70% of 
the length of the transcripts that likely repre- 
sent processed pseudogenes. Hiis number is 
probably an underestimate because specific 
methods to search for pseudogenes were not 
used 

. We looked for correlations between 
structural elements and the propensity, for 
retrotransposition-in "the human genome. 
GC content and transcript length were com- 
pared between the genes with processed'- 



pseudogenes (1177 source genes) versus 
the remainder of the predicted gene set. 
Transcripts that give rise to processed pseu- 
dogenes have shorter average transcript 
length (1027 bp versus 1594 bp for the Otto 
set) as compared with genes for which no 
pseudogene was detected. The overall GC 
: content did not show any significant differ- ' 
cnce, contrary to a recent report (88). There 
is a clear trend in gene families that are ' 
.present as processed pseudogenes. These 
include ribosomal proteins (67%), lamin 
receptors (10%), translation elongation fac- 
tor alpha (5%), and HMG-non-histone pro- 
teins (2%). The increased occurrence of 
retrotransposition (both intronless paralogs- 
and processed pseudogenes) among genes 
involved in translation and nuclear regula- 
tion may reflect an increased transcription- 
al activity of these genes. 

5.3 Gene duplication in the fiuman 
genome 

Building on a previously published procedure 
(27), we developed a graph-theoretic algo-. 
rithm, called Lek, for grouping the predicted 
human protein set into protein families (89), 



Table 13. Characteristics of CpG islands identified in chromosome 22.(34-Mbp sequence length) and the 
whole genome (2.9-Gbp sequence length) by means. of two different methods. Method 1 uses a CG 
likelihood ratio of 2:0.6. Method 2 uses a CG likelihood ratio of ^0,8! 
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Table 14. Distribution of repetitive DNA In the . compartmentalized shotgun asserhbly sequence. 



Repetitive elemerits 


*' M^abas^ In . 
• ■ • assembled 
...sequences 


Percent 
of 

assembly 


Previously 
predicted 
(%) (83) 


Alu 


288 


9.9 


10.0 


Mammalian Interspersed repeat (MIR) 


66 


2.3 


1.7 


Medium reiteration (MER) 


50 . 


1.7 


1.6 


Long terminal repeat (LTR) 


155 


5.3 


5.6 


Long Interspersed nucleotide element 


466 


16.1 


16.7 


(LINE) 








Total 


1025 


353 


35.6 



•The complete clusters that result from the 
Lek clustering provide one basis for compar- 
ing the role of whole-genome or chromosom- 
al duplication in protein family expansion as 
opposed to other means, such as tandem du- 
plication. Because each complete cluster rep- 
resents a closed and certain island of homol- 
ogy, and because Lek is capable of simulta- 
neously clustering protein complements of 
several organispis, the. number of proteins 
contributed by each organism to a complete 
cluster can be predicted with confidence de- 
pending on the quality of the aimotation of 
each genome. The variance of each organ- 
ism's contribution to each cluster can then be . 
calculated, allowing an assessment of the rel- 
- ative. importance of large-scale duplication 
versus smaller-scale, organism-specific ex- 
pansion and contraction of protein families, 
presumably as a result of natural selection 
operating on individual protein families with- 
in an organism. As can be seen in Fig. 12, the 
large variance in the relative numbers of hu- 
man as compared with D, melanogaster and 
Caenorhabditis elegans proteins in complete 
clusters may be explained by multiple events 
of relative expansions in gene families in 
each of the three animal genomes. Such ex- 
pansions would give rise to the distribution 
that shows a peak at 1 : 1 in the ratio for 
human-worm or human-fly clusters with the 
' slope spread.covering both human and fly/ 
. worm predominance, as we observed (Fig. 
12). Furthermore, there are nearly as many 
clusters where worm and fly proteins pre- 
dominate despite the larger numbers of pro- 
teins in the human. At face value, this anal- 
ysis suggests that natural selection acting on 
individual protein families has been a major 
force driving the expansion of at least some 
elements of the human protein set. However, 
' in our analysis, the difference between an 
ancient whole-genome duplication followed 
by loss, versus piecemeal duplication, cannot 
be easily distinguished. In order to differen- 
tiate these scenarios, more extended analyses 
were performed. 

5.4 Large-scale duplications 
Using two independent methods, - we 
searched for large-scale duplications in the 
human genome. First, we describe a protein 
family-based method that identified highly • 
conserved blocks of duplication. We then 
: describe our comprehensive method for identi- . 
^dng all interchromosomal block duplications. 
The latter method identified a large numfeer of 
duplicated chromosomal segments covering 
parts of all 24 chromosomes. 

The first of the methods is based on the 
idea of searching for blocks of highly con- 
served homologous proteins that occur in 
more than one location on the genome. For 
this comparison, two genes were considered 
equivalent if thcu* protein products were dc- 
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termlned to be in the same family and the 
same, complete Lck cluster (essentially 
paralogous genes) (89). Initially, each chro- 
; mosome was represented as a string of genes 
ordered by the start codons for predicted 
genes along the chromosome. We co'fisidered 
the two strands as a single string, because 
local inversions are relatively common events 
relative to large-scale duplications. Each 
gene was indexed according to the protein 
family and Xek* complete 'cluster (89), All 
•pairs of. indexed gene .strings. were tiieri 
aligned in both the forward and reverse di- 
rections with the Smith-Waterman algorithm 
(90). A rriatch between two proteins of the 
same Lek complete cluster was given a score 
of 10 and a mismatch —10, with gap open 
and extend penalties of —4 and —1. With 
these parameters, 19 conserved interchromo- 
somal blocks of duplication were observed, 
all of which were also delected and expanded 
by the comprehensive method described be- 
low. The detection of only a relatively small 
number of block duplications was a conse- 
quence of using an intrinsically conservative 
method grounded in the conservative con- 
straints of the complete Lek clusters. 

In the second, more comprehensive ap- 
proach, we aligned all chrocno'somes directly 
with one another using an algorithm based on 
the MUMmer system (P7). This alignment 
method uses a suffix tree data structure and z"' ■ 
linear-time algorithm to ah'gn long sequences \ 
very rapidly; for example, two chromosomes, 
of. 100 Mbp can be aligned in less than 20 ' 
min (on a Compaq Alpha computer) with 4 
gigabytes of mempiy. This procedure was 
used recently to identify numerous large- 
scale segmental duplications among the five 
chromosomes of A, thaliana (92); in that . 
organism, the method revealed that 60% of 
the genome (66 Mbp) is covered by 24 very 
large duplicated segments. Yoi Arahidopsis, a 
DNA-based alignment was sufficient to re- 
veal the segmental duplications between 
chromosomes; in the human 'genome, DNA 
aligrunents at the whole-chromosome level 
are insufficiently sensitive. Therefore, a mod- 
ified procedure was developed and applied, 
as-^ follows. First, .all 26,588 - prpt'eins . . 
(9,675,713 million amino acids) were concat- 
enated cnd-to-cnd ihi; .Order as they occur 
along each of the 24. chromosomes, irrespec- 
tive of strand location. The concatenated pro- 
tein set Was then aligned against e'ich chro- 
mosome by the MUMmer algorithm. The 
resulting matches were clustered to extract all 
sets of three or more protein matches that - 
occur in close proximity on two di^erent 
chromosomes (Pi);- these represent the can- 
didate segmental duplications. A series of 
filters were developed and applied to remove 
likely false-positives from this set; for exam- 
ple, small blocks that were spread across 
many proteins were removed. To refme the 



filtering methods, a shuffled protein set was 
fu^t created by taking the 26,588 proteins, 
randomizing their order, and then partitioning 
them into 24 shuffled chromosomes, each 
containing the same number of proteins as the 
true genome. This shuffled protein set has the 
identical composition to the real genome; in 
particular, ; every protein and every domain 
appears the same number of times. The com- 
plete algorithm was then applied to both the 
• real and the shuffled data,' \wth the results on * 
" the shuffled data being used to estimate the . 
false-positive rate. The algorithm after filter- 
ing yielded 10,310 gene pairs in 1077 dupli-. 
cated blocks containing 3522 distinct genes; 
tandemly duplicated expansions in many of 
the blocks explain the excess of gene pairs to 
distinct genes. In the shuffled data, by con- 
trast, only 370 gene pairs were found, giving 
a false-positive estimate of 3.6%. The most 
likely explanation for the 1077 block dupli- ; 
cations is ancient segmental duplications. .In 
many cases, the order of the proteins has been 
shuffled, although proximity is preserved. 
Out of the 1077 blocks, 159 contain only 
three genes, 137 contain four genes, and 781 . 
contain five or more genes. 

To illustrate the extent of the detected 
duplications. Fig. 13 shows all 1077 block 
duplications indexed to each chromosome in 
24 panels in which only duplications mapped 
to the indexed chromosome are displayed. 
The figure makes it clear that the duplications 
"are .ubiquitous in the genome. One feature .• ' 
that it displays is many relatively small chro- . 
mosomal stretches, with one-to-many dupli- 
cation relationships that are graphically strik-* ; 
ing. One such example captured by the anal-j^ 
ysis is the we 1 1 -documented olfactory recep-"' 
tor (OR) family^ which is scattered in blocks 
throughout the genome and which has been 
analyzed for genome-deployment reconstruc- 



tions at several evolutionary stages (94). The 
figure also illustrates that some chromo- 
somes, such as chromosome 2, contain many 
more detected large-scale duplications than 
others. Indeed, one of the largest duplicated 
segments is a large block of 33 proteins on 
chromosome 2, spread among eight smaller 
blocks In 2p, that aligns to a paralogous set on 
chromosome 14, with onerearrangemerit (sec 
chromosomes 2 and 14 panels in Fig. 13). 
- ; .The proteins are not contiguous but span a 

• region* containing '97 proteins 'on chromo- . 

• some 2 and 332 proteins on chromosome 14. 
The likelihood of observing this many dupli- 
cated proteins by chance, even over a span of 
this length, is 2.3 X 10~^« (Pi). This dupU- 
cated set spans 20 Mbp on chromosome 2 and 
63 Mbp on chromosome 14, over 1Q!% of the 
latter chromosome. Chromosome 2 also con- 
tains a block duplication that is nearly as 
. large, which is shared by chromosome arm 2q 
.and chromosome 12. This duplication incor- 
porates two of the four known Hox gene 
clusters, but considerably expands the extent 
of the duplications proximaliy and distally on 
the pair of chromosome arms. This breadth of 

. duplication is also seen on the two chromo- 
somes carrying the other two Hox clusters. 
■ . An additional large duplication, between 
chromosomes .18 and 20, serves as a good 
example to illustrate some of the features 
coinmon to many of the other observed large 
duplications (Fig. 13, inset).* This duplication 
contains 64 detected ordered intrachrbrno- 
■ somal pairs of homologous genes. After dis- 
counting a 40-Mb stretch "of chromosome 1 8 
free of matches to chromosome 20, which is 
likely to represent a large insert (between the 
gene assignments "Krup rel" and "collagen 
rel" on chromosome 18 in Fig. 13), the full 
duplication segment covers 36 Mb on chro- 
mosome 18 and 28 Mb on chromosome 20. 
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Fig. 12. Gene duplication In complete protein clusters. The predicted protein sets of human, worrn, 
and fly' were subjected to Lck clustering (27). The numbers of clusters with varying ratios (whole 
number) of human versus worm and human versus fly proteins per cluster were plotted. 



www.sdencemag.org SCIENCE VOL 291 16 FEBRUARY 2001 



1329 



THE HUMAN GENOME 



By this measure, the duplication segment 
spans nearly half of each-t:hromosome*s net 
length. The most likely scenario is that the 
whole span of this region was duplicated as a 
single very large block, followed by shuffling 
owing to smaller scale rearrangements. As 
such, at least four subsequent rearrangements 
would need to be invoked to explain the 
relative insertions and inversions seen in the 
duplicated segment interval. The 64 protein 
pairs in this alignment occur among 217 pro- 
tein assignments on chromosome 18, and 
among 322 protein assignments on chromo- 
some 20/ for a density of invol ved proteins of ■ 
20 to 30%. -This is consistent with an ancient 
large-scale duplication followed by subse- 
quent gene loss on one or both chromosomes. 
Loss of just one member of a gene pair 
subsequent to the duplication would result in 
a failure to score a gene pair in the block; less 
than 50% gene loss on the chromosomes 
would lead to the duplication density ob- 
served here. As* an independent verification ' 
of the significance of the alignments detect- 
ed, it can be seen that a substantial number of 
. the pairs of aligning.protefns in this duplica- 
tion, including some of those annotated (Fig. 
13), are those populating small Lek complete 
clusters (see above). This indicates that they 
are members of very small faimilies, of para- 
logs; their relative scarcity within the genome 
validates the uniqueness and robust nature of 
their alignments. 

Two additional qualitative features were ob-. 
served among many of the large-scale duplica- 
tions. First, several proteins with disease asso- 
ciations, with OMIM (Online Mendelian Inher- 
itance in Man) assignments, are members of 
duplicated segments (see web table 2 on Sci- 
ence Online at www.sciencemag.oi:g/cgi/con- 
tenl/full/291/5507/1304/DCl). We have also 
observed a few instances where paralogs on 
both duplicated segments are associated with 
similar disease conditions. Notable among 
these genes are proteins involved in hemostasis 
(coagulation factors) that are -associated "with 
bleeding . disorders, / transcriptional regulators 
like the hom^oboX proteins associated with de- 
velopmental disdrders, and potassium channels 
associated with cardiovascular conduction ab- 
normalities. For each of thefse disease genes, 
closer study of tfie paralogous genes in the 
duplicated segment may reveal nevv insights 
into disease causation, with further investiga- ' 
tion needed to detenmine whether they might be 



pair of duplicated chromosome re^'ons was • 
observed in many compared regions. Hypothe- 
ses to explain which mechanisms foster these 
processes must be tested. 

Evaluation of the alignment results gives 
some perspective on dating of the duplications. 
As noted above, large-scale ancient segmental 
. duph'cation in fact best explains many of the . 
. blocks detected by this genome-wide analysis. 
• The regions of human chromosomes involved . . 
in the large-scale duplications expanded upon 
. above (chromosomes 2 to 14, 2 to 12, and 18 to 
20) are each syntenic to a distinct mouse chro- • 
. mpsomal . re^on. The corresponding mouse*. 
' . ; chromosomal regions are much more similar in . 

sequence conservation, and even in order, to 
. their human synteny partners than the human 
dupUcation regions are to each other. Further, 
the corresponding mouse chromosomal regions . 
each bear a significant proportion of genes or- 
thologous to the human genes on which the 
human duplication assignments were made. On 
the basis of these factors, the corresponding 
mouse chromosomal spans, at coarse resolu- 
tion, appear to be products of the same large- 
scale duplications observed in humans. Al- 
though further detailed analysis must be carried 
out once a more complete genome is assembled 
for mouse, the underlying large duplications . 
* appear .to predate the two species* divergence. 
. This dates the duplications, at the latest before 
; divergence of. tiie primate and rodent lineages. 
This date can be further refined upon examina- 
tion of the synteny between human chromo- 
somes and those of chicken, pufferfish {Fugu 
rubripes), or zebrafish (95). The only sub- 
stantial syntenic stretches mapped in these 
species corresponding to both pairs of human, 
duplications are restricted to the Hox cluster 
regions. When the synteny of these regions 
(or others) to human chromosomes is extend- 
ed ..wth further mapping, the ages of the 
nearly chromosome-length duplications seen 
in humans are likely to be dated to the root of 
vertebrate diVerjgence. 

The MUMmer-bascd results demonstrate 
large block dupUcations that range in size from 
a few genes to segments covering most of a 
chromosome. The extent of segmental duplica- 
tions raises the Question of whether an ancient . 
whole-gendme duplication event is the under- 
lying explanation for the rtumerous duplicated 
regions (96). The duplications- have undergone 
many deletions and subsequent reanangements; 
these events make it difficult to distinguish 



involved in the same or similar genetic diseases.*. . between a whole-genome duplication and mul- 



Second, aldiough there is a conserved number, 
of proteins and coding exons predicted for spe- 
cific large duplicated spans within the chromo- 
some 18 to 20 alignment, the genomic DNA of 
chromosome 18 in these specific spans is in 
some cases more than 10-fold longer than the 
corresponding chromosome 20 DNA This se- 
lective accretion of noncoding DNA (or con- 
versely, loss of noncoding DNA) on one of a 



tiple smaller events. Further analysis, focused 
especially on comparing the estimated ages of 
all the block dtiplications, derived partially 
from interspecies genome comparisons, will be 
necessary to detennine which of these two hy- 
potheses is more likely. Comparisons of ge- 
nomes of different vertebrates, and even cross- 
phyla genome comparisons, will allow for the 
deconvolution of duplications to eventually re- 



veal the stagewise history of our genome, and 
with it a history of the emergence of maj^y of 

the key functions that distinguish us from other 
living things. 

6 A Genome-Wide Examination of 
Sequence Variations 
Summary. Computational methods were used 
; to identify single-nucleotide polymorphisms 
(SNPs) by comparison of the Celera sequence 
to other SNP resources, the SNP rate be- 
tween two chromosomes was —1 per 1200 lo 
1500 bp. SNPs are distributed nonrandomly 
..throughout the genome. Only a very small 
proportion of all SNPs (<1%) potentially 
impact protein function based, on the func- 
tional analysis of SNPs that affect the pre* 
dieted coding regions. This results in an cs* 
timate that only thousands, not millions, of 
genetic variations may contribute to the struc- 
tural diversity of human prpteins. 



Having a complete genome sequence cn:ihlcs 
researchers to achieve a dramatic accclcnition 
in the rate of gene discovery, but only throii^^h 
analysis of sequence variation in DNA can wc 
discover the genetic basis for variation in health 
among human beings. Whole-genome shotgun 
. sequencing is a particularly effective method 
. for detecting sequence variation in tandem with 
■ : y/hole-genome assernbly. In addition, wc com- . 
• pared the.:distn*bution and attributes of SNP;i 
. ascertained by three other methods: (i) alitsn-. 
ment of the Celera consensus sequence lo llic 
PFP assembly, (ii) overiap of high-quality reads 
of genomic sequence (referred to as "Kwok"; 
1,120,195 SNPs) {97% and (iii) reduced repre- 
sentation shotgun sequencing (referred to ns 
•TSC"; 632,640 SNPs) (98). These data wcir 
consistent in shov^ing an overall nucleotide di- 
versi^ of -8 X \0-\ marked heterogeneity 
across the genome in SNP density, and an 
•overwhelming preponderance of noncoding 
variation that produces no change in cxprcs.scd 
proteins. 

6.1 SH?s found by aligning the Celera 
consensus to the PFP assembly 
Ideally, methods of SNP discovery make full 
use of sequence.depth and quality at every site, 
and quantitatively control the late of false-pos- 
itive and false-negative calls with an explicit 
sampling model (99). Comparison of consensus 
sequences in the absence of these details neces- 
sitated a more ad hoc approach (quality scores 
could not readily be obtained for the PFP as- 
sembly). First, all sequence differences between 
"the two consensus sequences were identiiito. 
these were then filtered to reduce the contnou- 
tion of sequencing errors and misassembly. a 
a measure of the effectiveness of the 

step, we mom-tored the ratio ^^^^^'J" 
transversion substitutions, because a 2:1 

has been weU documented as typical m mn • 

malian evolution (100) and in human .^^^ 
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{101, 102). The filtering steps consisted of re- 
moving variants where the quaUty score in the 
Cclera consensus was less than 30 and where 

. ^^dehsityofvariants was greater than 5 in 400 
bp. These fdters resulted in shifting the transi- 
tion-to-transvcrsion rado from .157-1 to 
1.89: L When applied to 2,3 Gbp of alignments 
between the Celcra and PPP cQnseosus se- 
quences, these fiJtere resulted in identification 
of 2,104,820 putative SNPs fi-om a total of • 
2,778,474 substitution differencei. Overlaps 
l>etweea tius set of SNPs and those found by ^ 
other methods are described below. '* : 
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to their being the smaUest two sets. In addition 
24.5% of the Celera-PFP SNPs overlap with 
SNPs derived from the Cclera genome se- 
quences (46y SNP validation in populadon 
samples is an expensive and laborious process, 
so confirmation on muldple data sets may pro-' 
vide an efficient inidal vaUdation "in silico" (by 
computational analysis). 

One means of assessing whether the 



sue. These data are not readily available, so 
we could not estimate nucleotide diversity 
from the TSC effort. Estimation of nucleo 
tide diversity.from high-qualiiy sequence 
overlaps should be possible, but again 
more mforrnation is needed on the details 
ot all the ahgnmeats. 

Estimation of nucleotide diversity from a 
shotgun assembly entails calculating for each 



three sets of SNPs provide the same picture • :Sruro^^?i?;,-t'''-"'""'*""S 
:-of human variation is to tally the frS' ' ^^^^"^■"^^''""'^fH*""*^"*' ♦l>?P^oba^ 



•6.2 Comparisons to public SUP 
databases 

i"<=l"ding 2,536.021 from 
dbSNP (wwv.ncbi.nlm.nih.gov/SNP) and 
13,150 from HGMD (Human Gene Muta- 
tion Database, from the University of 
Wales. UK), were mapped on the Celeta con- 
sensus sequence by a sequence similarity 
search with the program PowerBlast (705) The 
two largest data sets in dbSNP axe the Kwok 
and TSC sets, with 47»/« and 25% of the dbSNP 
records. Low-quality alignments with partial 
coverage of the dbSNP sequence and align- 
ments that had less than 98% sequence identity 
behveen the Celera sequence and the dbSNP 
flanking sequence were eliminated. dbSNP se- 
quences mapping to multiple locations on the 

U23,038 unique locations on the Celera se- 
quence, implying considerable redundancy in' 
^ set mapped to 

1 1 unique genomic locations, and SNPs in 
the Kwok set mapped to 438,032 unique loca- 
tions. The combined unique SNPs counts used 
in this analysis, including Celera-PPP TSC 
and Kwok, is 2,737,668. Table 15 shows that a 
substantial fiaction of SNPs identified by one of 
hesemethods was also found by another meth- 
XL The very high overlap (36.2%) between the 
<:wok and Celera-PFP SNPs maybe due in part 
o the use by Kwok of sequences.that went into 
. ! .11 ^«"iWy The unusually low overlap 
.I6.4/0) between the Kwok and TSC sets is due 



each set of SNPs (Table 16). Previous mea- 
sures of nucleotide diversity were mostly 
derived from small-scale analysis on can- 
didate genes (/t?/), .and our analysis with 
all three data sets validates the previous 
observations at the whole-genome'^ scale. - 
There is remarkable .homogeneity, between 
•the SNPs found in the Kwok set, the TSC 
set, and in our whole-genome shotgun (46) 
in. this substitution pattern. Compared with 
the rest of the data sets, Celera-PFP devi- 
ates slightly from the 2:1 transition-to- 
transversion ratio observed in the other 
SNP sets. This result is not unexpected, 
because some fraction of the computation- 
ally Identified SNPs in the Celera-PFP 
comparison may in fact be sequence errors. 
A 2:1 transitionitransversion ratio for the 
bona Tide SNPs would be obtained if one 



NP databases. Table entries are SNP coOnts for 
ach pair of data sets. Numbers In parentheses are 
le fnaction of ovedap, calculated as the count 
vedappmg SNPs divided by the number 5f SNPs 
^fij%MT databases compared. 

P. 2,104.820; TSC, 585.811; and Kwok 438,032 

et isr* -""^ ^ ^-o' 



ences in the Celera-PFP set were a result of 
: (presumably random) sequence errors. 

6.3 Estimation of nucleotide cliversity 
from ascertained SNPs 
The .number of SNPs identified varied 
widely across chromosomes. In order to 
normalize these values to the chromosome 
size and sequence coverage, we used 17, the 
standard statistic for nucleotide diversity 
(104). Nucleotide diversity is a measure, of 
per-site heterozygosity, . quantifying the 
probability that a' pair of chromosomes- 
drawn frorn the population will differ at a 
nucleotide site. In ocdec to calculate nucle- 
otide diversity for .each chromosome, we 
tieed to know the' numher' of nucleotide 
sites, that were surveyed for variation, and 
in metho.ds like reduced respresentalion se- 
quencing, we need to know the sequence " 
quality and the depth of coverage at each 



and the probability of dieecting a SNP if in 
■ fact the alleles have differerit sequence (i e 
the probability ofconrect sequence calls) The 
greater the depth of coverage and the higher 
the sequence quality, the higher is the chance 
of successfiilly detecting a SNP (JOS), Even 
af^er conrecting for variation in coverage the' 
nucleotide diversity appeared to vary across- 
autosomes. The significance of this hetero^^e- 

neity was tested by analysis of variance, with 
cstunates of or for 100-kbp windows to esti- 
mate variabib'tyr within chromosomes (for the 
0 0001)^^^ comparison, F » 29.73, P < 

^ Average diversity for the autosomes es- 
timated firom the Celera-PFP comparison 
was 8.94 X 10"^ Nucleotide diversity on 
the X chromosome was 6.54 X 10~1 The 
X is expected to be less variable than au- 
tosomes, because foe every four copies of 
autosomes in the population, there are only 
three X chromosomes, and this smaller ef- ' 
fective population size means that random 
dnfi will more rapidly remove " variation 
from the X (7^5). . • * -* 

Having ascertained nucleotide variation 
-genome-wide, it appears that previous esti- 
mates of nucleotide diversity in humans 
based on samples of genes were reasonably 
accurate (lOI, 102, 107). Genome-wide, 
our eslunate of nucleotide diversity was' 
8.98 X 10^ for the Celera-PFP alignment, 
and a published estimate averaged over 10 
densely resequenccd human genes was 

zmx 10-^ (108). 

6.4 Variation in nucleotide diversity 
across the human genome 
Such an apparently high degree of variabil- 
ity among chromosomes: in SNP density 
raises the question of whether there is het- * 
crogeneity at a finer scale within chromo- 



. Table 16, Summary of nucleotide changes fn different SNP dita sets. 
SNP data set' 
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TSC-WUCSC The submlner of the data Is Unci Stein CoW^St ^^^^^ TSC^VACCR. and 
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Fig. 13. Segmental duplica- 
tions between chronrio- 
somes in the human ge- 
nome. The 24 panels show 
the 1077 dupUcated blocks 
of genes, containing 10310 
pairs of genes in total Each 
Lne represents a pair of ho- 
. molpgous genes belonging 
. to a block; all blocks con- 
tain at least three genes 
oh each of the chromo- 
somes where they appear. 
Each panel shows all the 
~ duplications between a 
". single chromosome and 
other chromosomes with 
shared blocks. The chro- 
mosome at the center of 
each panel Is shown as a 
thick red Une for emphasis. 
Other chromosomes are 
displayed from top to bot- 
. torn i^thln each panel or- 
dered by chromosome 
number. The Inset (bot- 
tom, center right) shows a 
close-up of one duplica- ' 
tlon between chroma- 
somes 18 and 20, expand- 
ed to display the gene 
names of 12 of the 64 
gene pairs shown. 
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somes, and whether this heterogeneity is 
greater than expected by chance. If SNPs 
occur by random and independent mutations, 
then it would seem that there ought to be a 
Poisson distribution of numbers of SNPs in 
fragments of arbitrary constant size. The ob- 
served dispersion in the distribution of SNPs 
in 100-kbp fragments was far greater than 
predicted from a Poisson distribution (Fig. 
14). However, this simplistic model ignores 
the different recombination rates and popula- 
tion histories that exist in different regions of . 
the genome. Population genetics theory holds . 
that we, can account for this variation with a . 
jnathematrcal formulation called the neutral -. 
coalescent Applying well-tested algo-. 
rithms for simulating the neutral coalescent : 
with recombination {110), and using an ef- 
fective population size of 10,000 and a per- 
base recombination rate equal to the mutation - 
rate {111), we generated a distribution of num- 
bers of SNPs by this model as well {112). The 
observed distribution of SNPs has a much larg- . 
er variance than either the Poisson model or the 
coalescent model, and the difference is highly 
significant This impUes tha^ there is significant .: 
variability across the genome in SNP density, 
an observation that begs an explanation: '. 

. Several attributes of the DNA- sequence 
may affect the local density of SNPs,- in- ' 
eluding the rate at which DNA polymerase " 
makes errors and the efficacy of mismatch -. 
repair. One key factor that is likely to be > 
associated with SNP density is the G+C' : 
content, in part because methylated cy- 
tosines in CpG dinucleotides tend to under- 
go deamination to form thymine, account- 
ing for a nearly 10-foId increase in the 
mutation. rate of CpGs over other dinucle-. . 
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■. otides. We tallied the GC content and nu- 
cleotide diversities in lOO-kbp windows 
' across the entire genome and found that the 
correlation between them was positive (r = 
0.21) and highly significant {P < 0.0001), 

.but G+C content accounted for only a 
small part of the variation. 

, 6.5 SNPs by genomic class 
.;To ; test -homogeneity of .SNP - densities 
across functional classes, we partitioned 
• sites into intergenic (defined as >5 kbp 
frofn any predicted transcription iinit), 5'- 
UTR, exonic . (missense and silent), in- 
tronic, and 3MJTR for 10,239 known, 
genes, derived from the NCBI RefSeq da- 
. tabase and all human genes predicted from 
the Celera Otto annotation. In coding re- 
gions, SNPs were categorized as either si- 
lent, for those that do not change amino 
acid sequence, or missense, for those that 
change the protein product. The ratio of 
missense to silent coding SNPs in Celera- 
PFP, TSC, and Kwok sets (1.12, 0.91, and 
0.78, respectively) shows a markedly re- 
duced frequency of missense variants com- " 
pared with the neutral expectation, consis- 
..tent with the elimination by natural selec- 
tion-of a fraction of the deleterious amino 
acid changes (//2). These ratios are com-. . 
parable tO:the missense-to-silent ratios of . 
.0.88 and 1.1 7, found by Cargill et aL {101) : 
and by Halushka e/ q/. (/(?2). Similar re- . 
suits were observed in SNPs deriyed from 
Celera shotgun sequences {46). 

It is striking how small is the fraction of 
SNPs that lead to potentially dysfunctional 
alterations in proteins. In the 10,239 Ref: 
Seq genes, missense SNPs were only about r 




Number Of SNPs / 100 kb 

Fig. 14. SNP densiw In each lOO-kbp Interval as determined with Celera-Pf P SNPs. The color codes 
are as follows: black, Celera-PFP SNP density; blue, coalescent model; and red, Poisson distribution. 
The figure shows that the distribution of SNPs along the genome k nonrandom and is not entirely 
accounted for by a coalescent model of regional histoiy. 



0.12, 0.14, . and 0.17% of the total SNP 
coimts in Celera-PFP, TSC, and Kwok 
SNPs, respectively. Nonconservative pro- 
tein changes constitute an even smaller frac- 
tion of missense SNPs (47, 41, and 40% in 
. Celera-PFP, Kwok, and TSC). Intergenic re- 
gions have been virtually unstudied {113), and 
. • we note that 75% of the SNPs we identified 
were intergenic (Table 17). The SNP rate was. 
. highest in introns and lowest in exons! The SNP 
rate was lower in intergenic regions than in 
introns, providing one of the first discriminatory 

• between these two classes ofDNA. These SNP 
rates were confirmed in the Celera SNPs, which 

. J also e'xhibited a lower rate in exons than in 
' ' introns,- and in extragenic regions than in in- 
trons {46), Many of these intergenic SNPs will 
provide valuable information in the form of 

• markers for linkage and association studies, and 
some fiacdon is likely to have a regulatory 

. function as well. 

7 An Overview of the Predicted 
Protein-Coding Genes In the Human 
Genome 

Summary. This section provides an initial 
computational analysis of the predicted 
protein set with the aim of cataloging 
prominent differences . and : similarities 
when the human genome is compared with 
other fully, sequenced eukaryotic genomes. 
Over-40%-of .the predicted protein set in 
humaris cannot be ascribed a molecular 
function by methods that assign proteins to 
known families. A protein domain-based 
analysis provides a detailed catalog of the 
prominent differences in the human ge- 
nome when compared with the fly and 
.worm genomes. Prominent among these are 
domain expansions in proteins involved in 
developmental regulation and in cellular 
processes such as neuronal function, hemo- 
Stasis, acquired immune response, and cy- 
toskeletal complexity. The final enumera- 
tion of protein families and details of pro- 
tein structure will rely on additional exper- 
imental work and comprehensive manifal 
curation, 

A prelirninary analysis of the predicted hii-. 
man protein-coding genes was conducted. 
Two methods were \ised to analyze and clas- 
sify the molecular functions of 26,588 pre- 
dicted proteins that represent 26,383 gene 
predictionis with at least two lines of evidence 
as described above. The first method was 
l>ased on an analysis at the level of protem 
families, with both the publicly available 
Pfam database {114, 115) and Celera's Pan- 
ther Classification (CPC) (Fig. 15) {116), 
The second method was based on an analysis 
at the level of protein domains, with both the 
Pfam and SMART databases {115, 117). 

The results presented here are preU ^ 
nary and are subject to several limitr / 
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Both, the geac predictions and functional 
assignments have been made by using com- 
;putational tools, although the statistical 
models in Panther, Pfam, and SMART have 
been built, annotated, and reviewed by ex- 
pert biologists. In the set of computationally 
predicted genes, we expect both false-positive 
predictions (some of these may in fact be inac- 
tive pseudogenes) and false-negative predic- 
tions (some human genes will iriot be.coniputa- 
tionally predicted). We also^ expect errors in ' 
delimiting .the boundiaries of exons and genes. 
Similarly, in the automatic functional assign- 
ments, we also expect both false-positive and 
false-negadve predictions. The functional as- 
signment protocol focuses on protein families 
that tend to be found across several organisms, 
or on families of known human genes. There- 
fore, we do not assign a function to many genes 
that arc not in large families, even if the func- 
tion is fcoowEL Unless otherwise specified, all 
enumeration of the genes in any given family or 
functional category was taken. from the set of 
26,588 predicted proteins, which were assigned 
functions by using statistical score cutoffs de- 
fined for models in Panther, Pfam, and 
SMART. 

For this initial examination of the pre- 
dicted human protein set, three broad ques- 
tions were asked: (i) What are the likely 
molecular functions of the predicted gene • 
products, and how are these proteins cate- 
goriz'ed with current classification meth- 
ods? (ii).What are the core functions that/.- 
appear to be common across the animals? 
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(lii) How does the human protein comple- 
ment differ from that of other sequenced 
eukaiyotes? 

7i1 Molecular functions of predicted 
human proteins 

Figure 15 shows an overview of the puta- 
tive molecular fufnctions of the predicted 
26,588 human proteins that have .at .least 
t\vo lines of supporting evidence. About 
4i% (12,809) of the; geiie products -could 
not be classified from this initial analysis . 
and are terrned* proteins .with \unknown ■ 
functions. Because our automatic classifi- 
cation methods treat only relatively large 
protein families, there are a number of 
"unclassified" sequences that do, in fact, 
have a known or predicted function. For the 
60% of the protein set that have automatic 
functional predictions, the specific protein 
functions have been placed into broad 
classes. We focus here on molecular func- 
tion (rather than higher order cellular pro- 
cesses) in order to classify as many proteins 
as possible. These functional predictions 
are based on similarity to sequences of 
known function. 

In our analysis of the 12,731 additional low- 
confidence predicted genes (those with only 
one piece of supporting cadence), only 636 
(5%) of these additional putative genes were 
■assigned molecular functions by the automated 
^m6thods: * One-third of these 636 predicted 
genes represented endogenous retroviral pro- 
teins, . further suggesting that the majority of . . 



these unknown-function genes are not real 
genes. Given that most of these additional 
. 12,095 genes appear to be unique among the 
. genomes sequenced to datc,'many may simply 
* represent false-positive gene predictions. 

The most common molecular functions are 
the transcription factors and those involved in 
nucleic acid metabolism (nucleic acid eoiyme). 

• • Other functions that are highly represented in 

• ..the.huraan genome are the receptors, kinases,. 
■ and hydrolases. Not ^surprisingly,' most of the 

^ hydrolases are proteases. There are also many 
proteins that are. members of protp-oncogene 
families, as well as families of "select regula- 
tory molecules"; (i) proteins involved in specif- 
ic steps of signal transduction such as hetero- 
trimeric GTP-binding proteins (G proteins) and • 
..xell cycle regulators, and (u) proteins that mod- 
ulate the activity of kinases, G proteins, and 
phosphatases. 

Table 17. Distribution of SNPs In classes of 
genomic regions. 



Genomic region 
class 


Size of 
region 
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Fig. 15. Distribution 
of the ■ molecular 
functions of 26,383 
human genes. Each 
slice Usts the num- 
bers and percentages 
(in parentheses) of 
human gene functions 
assigned to a given 
category of molecular 
function. The outer cir- 
cle shows the assign- 
ment to molecular 
function <;ategories In 
the Gene*" Ontology 
(GO) (179). and the 
. Inner . circle shows 
the assignment to 
Celera's Panther mo- 
lecular function cate- 
gories (776). 
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7.2 Evolutionary conservation of core 
processes 

Because of the yarioiS ."model organism" 
genome-sequencing projects that have al- 
ready been completed, reasonahle compara- 
tive information is available for beginning the 
analysis of .the evolution of the human ge- 
nome. The genomes of S, cerevisiae (**bak-. 
ers* yeast") {118) and two diverse mverte- 
brates, C. elegans (a nematode worm) {119) 
and D, melanogaster {{iy) {26), as well as the 
first plant genome, A. thaliana, recently com- 
pleted {92), provide a diverse background for 
genome comparisons. 

We enumerated the **strict ortholdgs" con- 
served between human and fly, and between 
human and worm (Fig. 16) to address the 
question. What are the core functions that 
appear to be common across the animals? 
The concept of orthology is important be- 
cause if two genes are orthologs, they can be 
traced by descent to the common ancestor of 
the two organisms (an "evolutionarily con- 
served protein set"), and therefore are likely 
to perform similar, conserved functions in the 
different organisms. It is-critical in this anal- . 
ysis to separate orthologs (a gene that appears 
in two orgariisms by descent frorn.a common 
ancestor) from paralogs (a gene that appears • 
. in more than one copy in a given organism by . .: 
a duplication event) because paralogs may . 
subsequently diverge in function.* Following . 
the yeast-worm ortholog comparison in . 
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{120), we identified two different cases for' 
each pairwise comparison (human-fly and 
human-wonm). The first case was a pair of 
genes, one from each organism, for which 
there was no other close homolog in either 
organism. These are straightforwardly identi- 
fied as orthologous, because there are no . 
. additional members of the families that com- 
. plicate separating orthologs from paralogs. 
• The second case is a family of genes with 
.'. more than one member in either or both of the 
organisms being compared. Chervitz et aL 
{120) deal with this case by analyzing a 
phylogenetic tree that described .the relation- 
ships between all of the sequences in both 
. organisms, and then looked for pairs of genes 
that were nearest neighbors in the tree. If the 
nearest-neighbor pairs were from different 
organisms, those genes were presimied to be 
orthologs. We note that these nearest neigh-^ 
bors can often be confidently identified from * 
pairwise sequence comparison without hav- 
ing to examine a phylogenetic tree (see leg- 
end to Fig. 16). If the nearest neighbors are 
not from different organisms, there has been 
. a paralogous expansion in one or both organ- 
isms after the speciation event (and/or a gene 
loss by one organism). \Vhen this one-to-one 
correspondence is lost, defining an ortholog 
becomes ambiguous. For our initial compu- 
tational overview of the predicted human pro- . 
tein set, we could not answer this question for • . 
every predicted protein. Therefore, we con- 



sider only '^strict orthologs," i.e., the proteins 
with unambiguous one-to-one relationships 
(Fig. 16). By these criteria, there are 2758 
strict human-fly orthologs, 2031 human- 
worm (1523 in common between these sets). 
We defme the evolutionarily conserved set as 
those 1523 human proteins that have strict 
orthologs' in both .D. .melanogaster and C. 
elegans. \ 

•' the distribution of the functions of the 
•conserved protein set is shown in Fig. 16. 
Comparison with Fig. 15 shows that, not 
surprisingly, the set of conserved proteins is 
.not distributed among molecular functions in 
the same way as the whole human protein set. 
Compared wth the whole human set (Fig. 
.15), there are several categories that are over- 
represented in the conserved set by a factor of 
^2 or more. The first category is nucleic acid 
enzymes, primarily the transcriptional ma- 
chinery (notably DNA/RNA methyltrans- 
ferases, DNA/RNA polym"erases, helicases, 
DNA ligases, DNA- and. RNA-pfocessmg 
factors, nucleases, and ribosomal proteins). 
The -basic, transcriptional and translational 
machinery, is well known to have been con- 
served over evolution, from bacteria through 
to the most complex eukaryotes. Many ribo- 
nucleoproteins involved in RNA splicing also 
appear to be conserved, among the animals. 
Other enzyme types are also overrepresent- 
ed (transfcras.es, . oxidoreductases, ligases, 
.lyases,>and isomerases)/ Many of these cn- 



Fig. 16. Functions of putative 
orthologs across vertebrate 
and Invertebrate genomes. 
Each slice lists the number and 
percentages (in parentheses) 
of "strict orthologs" between 
the human, fly. and worm ge- 
nomes involved fn a given cat- 
egory of molecular' fiinctioa 
"Strict orthologs" are defined 
here as bi-directional BIAST 
best hits p80) such that ead) : 
orthologous pair (1) has a". 
BIASTP P->/atafe of :S10~?? 
(720). and (ii) has-a'more sig- 
nificant BL^P' score than 
any paralogs' In either orgaa-< 
Ism. I.e.. there has likely been . 
.no duplication subsequent/ to 
spedation that rnight make 
the orthology ambiguous. This 
measure Is ouite strirt and Is a 
lower bound on the'number of . 
orthologs. By these criteria, 
there are 2^58 strict human- 
fly orthologs, and 203'i hu- 
man-wonri orthologs (1523 In 
common between these sets). 



cjloskclctal 5:ructural prolcin (20. 1 2*A) 
ch2pcrooc(l6,0.9%), 
cell adhesion (1 1, 0.6%), 
mbccllancous 01,A2%) ^ 
vira! protctn (4, OJ?? •) . 
tmnsfei/camcrprDteln (1 1, 0.6K} <> 
tnmscription factor (8 1 . A,T/») . 



nuclctc acid ciu^'me P21, 12,9%) 



rcccplor{23,IJ%) 



kioase (69,4.0%) 



sclca fcsubloiy molecule (88, 5.1%) 



lr&A$rcruc(70.4.1%; 




cxtraccUulof matrix (12, 0.7%) 
ion channel (7, 0.4%) 
motor (13.0.8%) 

structural protein of muscle (8, 0.5%) 
protooncoficnc (23, 1 J%) 

fmnccllular transpoftcr (5 1 , 3.0%) 

transporter (44, 3.6%) 



synthase and synthetase (64. 3.7%) 

oxtdorcductasc (64. 3.7S) 

b'«»(l2.0.7%) 
li3Mc(9.0J%) 



molecular function unknox^'n (613, 35.S%) 



^■drdlasc (80,4.7%) 
isomcrasc(2l, 1.2^;) 
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zymcs arc involved in intermcdiaiy metabo- 
lism. The only exception is the hydrolase 
category, which is not significantly ovecrep- 
^ resented in the shared protein set Proteases 
form the largest part of this category, and 
several large protease families haye expanded 
in each of these three organisms after their 
divergence. The category of select regulatory 
molecules is also oVerrepresented in the con- 
served set. The major Conserved families are 
• small/ guanosine triphosphatases (GTPases). 
(especially the Ras-related superfamily, in- ' 
eluding ADP ribosylation ' factor) and cell 
cycle regulators (particuiarly the cullin fam- 
ily, cyclin C family, and several cell division 
protein kinases). The last two significantly 
OVerrepresented categories are protein trans- 
port and trafficking, and chaperones. The 
most conserved groups in these categories are 
proteins involved in coaled vesicle-mediated 
transport, and chaperones involved in protein 
folding and heat-shock response [particularly 
the DNAJ family, and heat-shock protein 
60 (HSP60). HSP70, and i&SP90 families]. 
These observations provide only a conserva- 
tive estimate of the protein families in the 
context of specific cellular processes that 
were likely derived from the- last common 
ancestor of the human, fly, and worm. As 
stated before, this analysis does not provide a 
complete estimate of conservation across the - 
three animal genomes, as paralogous dupli-.. 
cation makes the delenmination of true or- 
thologs difficult within the members of con- 
scr\'ed protein families. 



7.3 Differences between the human 
genome and other sequenced 
eukaryotic genomes 

To explore the molecular building blocks of 
the vertebrate taxon, we have compared the 
human genome with the other sequenced 
eukaryotic genomes at three levels^molec- 
ular functions, protein families, and protein 
domains. 

^ Molecular differences can be correlated 
Vrith phenotypic differences to begin to reveal 
the developmental arjd cellular processes .that 
are unique to the vertebrates. Tables 1 8*.and 
19 display a comparison among pill sequenced 
eukaryotic genomes, oyer .selected protein/ 
domain families (defined by sequence sjrai- 
larity, e.g., the serine-threonine proteiii ki- • 
nases) and superfamilies (defined by/shared 
molecular function, which may include sev- 
eral sequence-related families, e.g., the cyto- 
kines). In these tables we have focused on 
(super) families that are either veiy large or ' 
that differ significantly in humans compared 
with the other sequenced cukatyote genomes. 
NVe have found that the most prominent hu- 
man expansions are In proteins involved in (i) 
acquired immune functions; (ii) neural devel- 
opment, structure, and functions; (iii) inter- 
cellular and intracellular signaling pathways 



in development and homeostasis; (iv) hemo- 
stasis; and (v) apoptosis. 

Acquired immunity. One. of the most 
striking differences between the human ge- 
nome and the Drosophila or C. elegans ge- 
. nome is the appearance of genes involved in 
acquired imnjunity (Tables 18 and 19). This 
is expected, because the acquired irmnune 
response is a defense system that only occurs . 
.in vertebrates. We observe 22 .class ! and 22 
. * class . U major.-Ihislocompatibility*. complex 
(MHC) antigen genes and 1 14 other immu- 
. rioglobulin genes in the human geriome. In 
addition, there are 59 genes in the cognate 
immunoglobulin receptor family. At the do- 
main level, this is exemplified by an expan- 
sion and recruitment of the ancient immuno- 
. globulin fold to. constitute molecules such as 

• MHC, and oftheintegrin fold to form several 
of the cell adhesion molecules that mediate 
interactions between immune effector cells 

. and the extracellular matrix. Vertebrate-spe- 
cific proteins include the paracrine immune 
regulators family, of secreted 4-alpha helical 
bundle proteins, namely .the cytokines and •: 
chemokines. Some of the cytoplasmic signal 
transduction components associated with cy- 
tokine receptor signal transduction are also 
features that are poorly represented in the fly 
and worm. These include, .protein domains 

• found in the signal transducer and activator of 
' transcription (STATs), the suppressors of cy- 
tokine signaling (SOCS), and protein inhibi- 

'^ors of activated STATs (PIAS). In contrast, 
rnany of the animal-specific protein domains . 

• that play a role in innate immune response, . 
such as the Toll receptors, do not appear to be 
significantly expanded in the human genome. 

Neural development, structure, and 
■ . function. In the human genome, as compared 
with the worm and fly genomes, there is a 
marked increase in the number of members 
of protein families that are involved in 
neural development. Examples include neu- 
rotrophic fact9rs such as ependymin, nerve, 
growth factor, and signaling molecules 
such as semaphorins, a^.well as the number 
of -proteins invoWeS directly in neural 
structure and function such, as myelin pro- 
Veins, voltage-gated ion channe]s, and syn- 
aptic' proteins such as synaptotagmin.' 
These observatipns correlate weir with the 
known phenotypic differences between the 
nervous syStepis of thc^e taxa, notably (i) 
the increase in the number and connecUvily 
of neurons; (ii) the increase in numbec of 
distinct neural cell types (as many as a 
thousand or more in human compared with 
a few hundred in fly and worm) (/27); (iii) 
the increased length of individual axons; 
and (iv) the significant increase in glial cell 
number, especially the appearance of my- 
elinating glial cells, which are electrically 
inert supporting cells differentiated from 
the same stem cells as neurons. A number 



of prominent protein expansions are in- 
volved in the processes of neural develop- 
ment. Of the extracellular domains that me- 
diate cell adhesion, the connexin domain- 
containing proteins {122) exist only in hu- 
mans. These proteins, which are not present 
in the Drosophila or C elegans genomes, 
appear to provide the constitutive subunits 
of intercellular .channels and the structural 
■ : basis' for electrical coupling.: Pathway find- 
-:*; :ing by axons and neuronal network forma- 
tion is mediated through a subset of ephrins 
-and their cognate receptor tyrosine kinases 
that act as positional labels to establish 
topographical projections* (7 J?i), The prob- 
able biological role for the semaphorins (22 
in human compared with 6 in the fly and 2. 
in the worm) and their, receptors (neuropi- 
lins and plexins) .is that of axonal guidance 
molecules 024). Signaling molecules such 
■■ as neurotrophic factors and some cytokines 
have been shown to regulate neuronal cell 
survival, proliferation, and axon guidance 
{125). Notch receptors and ligands play 
important roles in glial cell fate determina- 
tion and gliogenesis {126). 

Other human expanded gene families play 
key roles directly in neural structure and 
function. One example is synaptotagmin (ex- 
panded more than twofold in humans relative 
to the .invertebrates), originally found to reg- 
ulate synaptic transmission by serving as a " 
Ca^"*" .sensor (or receptor) during synaptic 
vesicle fusion and release (727). Of interest is 
the ' increased . co-occurrence in humans of 
".PD2 and the SH3 domains in neuronal- 
.• specific adaptor molecules; examples include 
. proteins that likely modulate channel activity 
at synaptic junctions (/2<J).. We also noted 
expansions in several ion-charmel families 
(Table 19), including the EAG subfamily 
(related to cyclic nucleotide gated channels), 
the voltage-gated calcium/sodiuin channel 
family, the -inward-rectifier potassium chan- 
nel family, and the.voIta^e-gated potassium 
channel, alpha' subunit family. Voltage-gated 
sodium and potassium chaimels are involved 
in the generation of action potentials in neu- 
rons. Together with voltage-gated calcium 
channels, they also play a key role in cou- 
pling action potentials to neurotransmitter re- 
lease, in the development of neurites, and in 
short-term memoiy. The recent bbservation 
of a calciuni-regulated association between 
sodium charmels and synaptotagmin may 
have consequences for the establishment and 
regulation of neuronal excitability {129). 

Myelin basic protein and myelin-associat- 
ed glycoprotein are major classes of protein 
components in both the central and peripheral 
nervous system of vertebrates. Myelin PO is a 
major component of peripheral myelin, and 
myelin proteolipid and myelin oligodendro- 
cyte glycopoteln are found in the central 
nervous system. Mutations in any of these 
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Table 18. Domain-based comparative analysis of proteins in H saokns tH\ 
D me/a^ojasrer (F). C eleg.ns (W), 5, ce.ev^/ae (Y). and/, rta^f^f^ Si' 
predicted protein set of each of the above eukaryotic organisms vwsVnllv^S ■ 
with Pfam version 5.5 using E value cutoffs of O^OO^n^nu^hT^, 

«llui« ^ 7 ''^"^ t'ofnalw were eateeorized Jnw 

cellular processes fo r presentation. Some domains (I.e.. SH2) are Brted In 

Accession 
number 



N Genome 



' Umltations of lar^e-scale aufnmrfT?^ f ^^'"J*'", owing to the 

of domains with rfduc'dc^^^^^^^^ Representative efamfJies 

this analysis are marked with a 3e ^sterif^^^^^^^^^ 
divergent and predominantly a Ipha. helical dlml^ ^amples Include short 
cystelne-rich zinc finger proteins ^""^ "^^^^'^ of 



. Pf02039 
Pf00212 
PF00028 
PF00214 
PF01110 
PF01093 
• PF00029 
PF00976 
PF00473 
PF00007 
PF00778 
PF00322 
PF00812 
PF01404 
PF00167 
, PF01534 
PF0O236 
PF01153 
PF01271 
PF020S8 
PF00O49 
PF00219 
PF02024 
PF00193 
PF00243 
PF02158 
PF06l84 
' PF02070 
PF00066 
PF00865 
PF00159 
PF01279 
PF00123 
PF00341 
PF014O3 
PF01033 
PF00id3 
PF02208 
PF02404 
PF01034 
PF00b20 
PF00019 
PF01099 
'PF01160 

PFoono 



Adrehomedullin 
ANP 
Cadherin 
CalcLCGRPJAPP 
CNTF * 
^.Clusterin ' ■ 
Connexin 
ACTH^domain 
• CRF 
Cys^knot 
DJX 

Fndothelin 
Ephrin • • 
EPhJbd 
FCF 
Frinled 
Hormones 
Clypican 
Cranin 
Cuanylln 
Insulin 
IGFBP 
Lepttn 
XlinJc 
NCF 

Neuregulin 
Hormones 

Nmu 

Notch 

Osteopontin 
Hormone3 
Parathyroid 
Hormone2 
PDGF 
Sema 

Somatomedin^B- ' ' 
Hormone 
Sorb 

SCF * • 
Syndecan 
TNFfLce 

TCF-p . r 

literoglobin 
Op!dds_neuropep • 

wnt : ■ 



PF01821 
PF00386 
PF0020d 
Pf00754 
PF01410 
.PF00039 
PF00040 ' 
PF00051 
PF01823 
PF00354 
PF00277 
PF00084 
PF02210 
PF01108 
PF00868 
PF00927 



ANATO 
Clq 

DIslntegrin 
FS.F8_type^C 

com 

Fnl 

Fn2 . 
ICrinile 

MACPF' . 

Pentaxin " * 

SAA4)roteIns 

Sushi 

TSPN 

TJssue.fac 

Transg!utamJn_N 

Transglutamin.C 



AdrenomeduUin ""'^^'^^'"'^^^^^^^ 
Atrial natriuretic peptide 
Cadherin domain 
Calcitonin/CGRP/iAPP family 
. . Ciliaiy neurotrophic factor 
CCuiterin 
Connexin 

Corticotropin ACTH domain 
Corticotropin-releasing factor family 
Cystine-knot domain 
Dix domain 
Fndothelin family 
Ephrin 

Ephrin receptor ligand binding domain 
Fibroblast growth factor 
Fri22led/Smoothened family membrane region 
Glycoprotein hormones 
Glypican 

, - Gralnin (chromogranin or secretogranin) 
Guanylin precursor 
Insulin/ICF/Relaxin family 
Insulin-l/ke growth factor binding proteins - 
Leptin • - 

LINK (hyaluron binding) 
Nerve growth factor family 
Neuregulin family 
Neurohypophysial hormones 
Neuromedin U , 
Notch (DSL) domain 
Osteopontin - ' ' 

Pancreatic hormone peptides 
Parathyroid hormone family 
Peptide hormone 

Platelet-derived growth factor (PDGF) 
Sema domain 
Somatomedin B domain 
Somatotropin . • ' 

Sorbin homologous domain 
Stem cell factor 

Syndecan domain . . - ' 

TNFR/NGFR cysteine-ricH /jgion- * \ . • 
; .Transforming growth factor p-Iike domain 
•.- Uteroglobin family • 
Vertebrate endogenous opioids neuropeptide 
Wnt family of developmental signaling proteins 

- ' ^ ■ ^ jHemostasis 

Anaphylotoxin-like domain . • 
, <nq domain ' ' •» 

DIsintegrin 

F5/8 type C domain * 

Fibrillar collagen C-termlnal domain • * J 

Fibronecti'ri type I domain' " 

Fibrdnectin type If domain • ♦ 

Kringle domain *^ ' 

MAC/Perforin domain 

Pentaxin family . • 
Serum amyloid A protein 
Sushi domain (SCR repeat) 
Thrombospondin N-termlnal-tike domains 
Tissue factor 
Transglutaminase family 
Transglutaminase family 
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Accession 
number 

Pf00594 



Domain name 



• Domain description 



W 



Cla 



PF00711 
Pf00748 
Pf00666 
.PF0pi29 

r 

- PF00993 
Pf00969 

PF01109 
PF00047 
PF00143 
PF00714 
PF00726 
PF02372 
PF00715 
PF00727 
..PF02025 
PF01415 
PF00340 
PF02394 
PFO2059 
PF004B9 
PF01291 

PF0O323 
PF01091 
PFO6277 
PF0004S 

.PF01582 
PF00229 
PFOOOSd 

PF00779 
PF00168 
PF00609 
PF00781 
PF00610 



PF01363 
PF00996 
PF0O503 
PF00631 
PF00616 
PF00618 

PF00625 
PF02189 
PFa0169 
PF00130 

PF0038d 

PF00387 



Defensin^beta 
. Calpamjnhib 
. 'Catheltcidins • 
" MHCJ , 

MHCJLalpha** 

HHCJLbeta** 

Defenslr^ropep 

GM^CSF 

l£ 

Interferon 

IFN-gamma 

ILIO 

IU5 

IL2 

IL4 

IL5 

IL7 

IL1 

IL1_4)ropep 

IL3 

116 

UF.OSM 

Defensins 
PTN.MK 
SAA^protelns 
IL8 

TIR 
TNF 
■ TrefoU 

BTK 
C2 

DAGKa 
DAGKc 
DEP 



FYVH ' 
GDI 

G-alpha . 
G-gamma 
RasCAP 
RasGEFN 

Guanytate^kin 
ITAM f 

DAG^PE-bind 

PI-PLC-X" 

. PI-PLC-Y 

PID 

PI3iep8S8 
PBierbd • 
' ArfCAP . 
RBD 

RapjGAP 

RA 

Ras 

RasGEF 

RGS 

Rlla 



"Vitamin K-dependent carboxylation/gamaia- 
carboxyglutamlc (GUJ domain 

Immune response 

Beta ^^feusm 

Calpain Inhibitor repeat • 

CatheUcidins;* " ... . ; - \. . . , * • 
Class I histocompatibility antigen, domains alpha 1 
" and 2 ' * ' * '* ' ;• • • • ' * * • " * 
Ciass II histocompatibility antigen/alpha domain 
Class It histocompatibility antigen, beta domain 
Defensin propeptide 

Granulocyte-macrophage colony-stimulating factor 

Immunoglobulin domain 

Interferon alpha/beta domain 

Interferon gamma - 

InterIeuWn-10 

lnterleuMn-15 

lnterleukin-2 

lnterleuIdn-4 

lnterleukin-5 

lnterleukin-7/9 family 

Interteukin-l • 

lnterleukin-1 propeptide 

lnlerleukin-3 

Interleukin-6/G-CSF/MGF family 

Leukemia Inhibitory factor (LIFj/oncostatin (OSM) 

family 
Mammalian defensin 
PTN/MK heparin-binding protein 
Senim am^pid A protein 
Small cytokine's (intecrine/chemokine), 

lntefleukin-8 like 
TIR domain ^ * • 

TNF {tumo£ necrosis factor) family . 
TrefoU (P-type} domain 

- Pl'PY-rho CtPase slgnaUng 
BTK motif f ^ 

C2 domain 

Diacylglycerol kinase accesso^^ domain (presumed) 
Dia<y!glycerol kinase catalytic domain (presumed) 
Domain found In Dishevelled, Egl-10, and 

Pleckstrin (DEP) 
FYVE zinc finger 
CpP dissociation Inhibitor 
G-proteIn alpha subunlt . 
G-protein gamma like domains ** 
GTPase-activator protein for Ras-Uke CTPase 
Guanine nucleotide exchange factor forRas-Uke 

CTPases; N-terminal nnollf 
Cuan^flate kinase- " r^-'--" 

rmmuqpreceptor.tyroslne-based activation mplif 
PH domain • . ••*'.. 

Phofbol esters/diacylglycerol binding domain (G1 
• domain) * • 

Phosphatidylinositol-spedfic phospholipase C, X 
domain . "' . 

Phosphatidylinositol-specific phospholipase C Y 
domain 

Phosphotyrosine Interaction domain (PTB/PID) 

PJ3-kinase family, p85-binding domain ' '.' 

Pi3-kinase family, ras-binding domain 

Putative GTP-ase activating protein for Arf 

Raf-like Ras-binding doniaTn 

Rap/ran-GAP 

Ras association (RalGD5/AF-6) domain 
Ras family 
RasGEF domain 

Refgulator of G protein signaling domain 
Regulatory subunlt of type II PKA R-subunIt 



n 



..3(9) 
- -• * - 2 
'18(20) 

5(6) 
7 
3 
1 

381 (930) 
7(9) 



2 
2 

2 
2 
4 

32 

■ 18 
• 12 
5(6) 

5 

73(101)-- 
9 
10 
12(13) 

28 (30) 
6 

27(30) 
16 
11 
9 

.12 
3 

193(212) 
45(56) 

12 

11 . 

24.(27) 
2 
6 
16 
6(7) 
5 

18(19) 
126 
2.1 
27 
4 



0 
0 
0 

^ V- : . 0 
.0 

o 

0 
0 

125(291) 
0 
O 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 

0 
0 
0 
0 

8 
0 

1 

32(44) 
4 
8 
4 

14 

2 
10 
• 5 

5 

2 



0 


0 


u 


0 


. 0 


u 


: : 0. . 


■* 0 




.,';0" 


0 


u 


0 


0 


u 


0 


0' 


u 


0 


0 


A 


0 


0 




67(323) 


0 


0 


0 


0 


0 


0 


0 


Q 


0 


0 


0 


0 


0 


Q 


0 


0- 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Q 


0 


0 


0 


0 


0 


^ 0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


2- 


b 


131 (143) 


0 


0 


. 0 


2- 


0 


0 



.0 

24(35) 
7 
8 
10 

15 
1 

20(23) 
5 
8 
3 



0 

6(9) 
0 
2 
5 

5 
1 
2 
1 
3 
5 



0 

66 (90) 

11(12) 
2 

15 
3 
5 
0 
0 
O 



8 


7 


1 


4 


0 


0 


0 


0 


72(78) 


65(68) 


24 


. 23 


25(31) 


26(40) 


1(2) ; 


4 


3 


7 


1 


8 


.••* 2 


7 


1 


8 


13 


11(12) 


0 


0 


1 


1 


0 


0 


3 


1 


0 


0 


9 


8 


*6 


15 


4 


1 


0 


0 


4 . 


2 


0 


0 


7(9) 


6 


1 


0 


56(57) 


51 


23 


78 


8 




5 


0 


6(7) 


12(13) 


1 


0 


1 


2 


1 


0 
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Accession 
number 



Domain name*' 



Domain description 



Pf00620 


RhoGAP 


Pf 00621 


RhoGEF 


PF0O536 


SAM 


Pf01369 


Sec7 


Pf00017 


SH2 


PfOOOlS 


SH3 


Pf01017 


STAT 


PfOOZflO 


VHS 


Pf00568 


WHl 


PF00452 


Bcl-2 


PF02180 


- BH4 


Pf00619 


. CARD 


PF00531 


Death 


PF01335 


DED 


PF02179 


BAG 


PF00656 


ICE^20 


PF00653 


SIR 



PF00022 
PF00191 
PF00402 
PF00373 
PF00880 
. PF00681 
PF00435 
PF00418 
PF00992 
PF02209 
PF01044 

PF01391 
• PF01413 

PF00431 
PF00008 
PF00147 

PF00041 
Pf0O7S7 
PF00357 
PF00362 
PF00052 
PF00053 
PF00054 
PF00055 
PF00059 
PF01463 
^ PF01462 
* PF00057 
PF00058 
PF00530 
.PF00084 
PF00090 ' 
PF00092 
PF00093 
PF00094 

PF06244 . 
PF00023 
PF00514 
PF00168 
PFO0O27 
PF01556 
PF00226 
PF00036 

PFooen 

PF01846 
PF00498 



Actin 
Annexin 
Calponin 
Ban(L41 
Nebulln^repeat- 
Plectlnj-epeat 
Spectrin 
TubuUn-bindlng 
Troponin 
VHP * 
Vinculin 

Collagen 
C4 

CUB 
EGF 

Fibrinogen^C 
Fn3 

Furin-llke 
Integrrn^A 
Integrln^B 
Laminin^B 
Lamlnin^EGF 
Laminln^G 
Laminln^Nterm 
Lectin c 
^IRRCT 
- LRRNTT. 
LdLrecept.^ 
Ldi^recept'b 
SRCR.* 
Sushi 
tsp^l 
Vwa 
Vwc 

Vwd . 

14-3-3 
Artk 

Armadillojseg - - 
C2 

cNMP^blnding 
DnaJ^C 
DnaJ 
Efhand** 
FCH 
FF 
FHA 



RhoGAP domain 
RhoGEF domain 

SAM domain (Sterile alpha motif) 
Sec7 domain 

Src homology 2 (SH2) domain 
Src homology 3 (SH3) domain 
STAT protein - 
; VHS domain 
WHldomaln 

Oomalns involved In apoptosts 
Bcl-2 . ^ 

. Bcl-2 homology region 4 ' 

. Caspase recruitment domain 

Death domain * * • 

Death effector domain 
. Domain present In Hsp70 regulators 

ICE-like protease (caspase) p20 domain 

Inhibitor of Apoptosls domain 

, Cytoskeleta! 

Actin 
Annexin 
Calponin family 

F£RM domain (Band 4.1 family) 
• Nebulin repeat 
Plectin repeat 
Spectrin repeat 

Tau and MAP proteins, tubulln-blnding 
Troponin 

Villin headpiece domain 
VtncuUn family : 

.1. 4 * ' ECM adhesion 

Collagen triple helix repeat (20 copies) , 
C-termlnat tandem repeated domain In type 4 

procollagen 
CUB domain 
ECF-Uke domain 

Fibrinogen beta and gamma chains, C-terminal 

globular domain 
FIbronectin type 111 domain 
Furin-Uke cysteine rich region 
Integrin alpha cytoplasmic region 
Integrins, beta chain - - 

Laminin B (Domain IV) - • * 
Laminin EGF-Uke (Domains lll and V) 
Laminin G domain 

Laminin N-terminal (Doniain VI)' " 
Lectin C-type domain ^ . ^ ■ 
; / Leucine rich repeat C-terrninal domain 
. - Leucine rich repeat N-termlnal domam * ' 
• Low-density lipoprotein receptor domain class A 

Low-density lipoprotein receptor repeat class B 
. Scavenger receptor cystelne-rlcl) domain 
' '5ushl domain (SCR repeat) 
, Thrombospondin type 1 domairij. j. 
von WiUebrand factor type A domain 
von Wlllebrand factor type C domain 
von WiUebrand factor type D domain • • 

.* . - • Protein Interaction dom'ahs 

14-3-3 proteins; 
Ank repeat •/\*- 
ArmadiUo/beta-caie'nln-lIke repeats 
C2 domain 

Cyclic nudeotlde-blnding domain 
DnaJ C teimlnat region 
DnaJ domain 
EF hand 

Fes/CIP4 homology domain 
FF domain 
FHA domain 



H 

59 
46 
29(31) 
13 

. 87(95) 
143 (182) 
7 
4 
7 

' 9 

• 16 
16 
4(5) 
5(8) 
11 
8(14) 

61(64) 
16(55) 
13(22) 
29(30) 
4(148) 
2(11) 
31 (195) 
4(12) 
4 
5 
4 

65(279) 
.6(11) 



19 

23(24) 
15 
5 

33(39) 
55(75) 
1 
2 

2 • 
2 

■ ' 0 
0 
5 
0 
3 
7 

5(9) 

15(16) 
4(16) 
3 

17(19) 
1(2) 
0 

13(171) 

• 1(4) 
6 

2 • 
2 



-47(69) 
108(420) 
26 



/ 10(46) 
2(4) 

9(47) 
45(186) 
10(11) 



106(545) 
5 
3 
8 

8(12) 
24(126) 
30(57) 
10 
47(76) 
69(81) 

40 (44) 
35 (127) 

15(96) 
.11(46) 
53(191) 

41 (66) 
34(58) 
19(28) 
15(35) 



' 20 
145(404) 
22(56) 
73(101) 
26(31) 
. 12 
44 

83(151) 
9 

4(11) 
13 



42(168) 
2 
1 

• - 4(7) 
?{621 
18(42) 
6 

23(24) 
23(30) 
7(13) 
33(152) 
9(56) 
4(8) 
11(42) 
, ,11(23) 
0 . 
6(11) 
.3(7) 

3 

72(269) 
11(38) . 
32(44) 
21(33) 
9 
34 

64(117) 

4(10) 
15 



W 

20 
18(19) 
8 
5 

44 [48) 
46(61) 

i(2) 
4 

.2(3) 

1 
1 
•2 . 
7 
0 
2 
3 

2(3) 

12 
4(11) 
7(19) 
11(14) 
1 
0 

10(93) 
2(8) 
8 
2 
1. 



..1f4<384) 
3(6) 

43(6?) 
54(157) 
6 

34(156) 
1 
2 
2 

6(10) 
11(65) 
14(26) . 
4 

91 (132) 
7(9) 
,3(6) 
27(113) 
7(22) 
1(2) 
8(45) 
18(47) 
17(19) 
2(5) 
9 • 

3 

75(223) 
3(11) 
24(35) 
15(20) 
5 
33 
41(86) 

3(16) 
7 



9 

3 
3 
5 
1 

23(27). 
0 
4 
1 

0 
. 0 
0 
0 
0 

1 

0 

1(2) 

9fll) 
0 
0 
0 
0 
0 
0 
0 
0 
0 

p 

. 0 
0 

0 
0 
0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 
0 
0 
0 
0 
0 
0 
0 
0 
0 



8 
0 
- 6 
9 • 
3 
4 

0 . 

8 

0 

0 
.0 
0 
0 
0 
5 
0 
0 



24 
6(16) 
0 
0 
O 
0 
0 

o 

0 
5 

0. 

0 . 
0 

0 
1 
0 

1 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 

o . 

0 
0 
0 

1 

0 
0 



2 

12(20) 
2(10) 
6(9) 
2(3) 
3 
20 



15 

66 (111) 
25(67) 
66(90) 
22 
19 
93 



4(11) 120(328) 

4 , V 0 

2(5) 4(8) 

13(14) 17 



340 
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myelin proteins result in severe demyelina- 
tion, which is a pathological condition in 
whach the myelin is lost and the nerve con- 
duction is severely impaired (130). Humans 
have at least 10 genes belonging to four 
different families involved in myelm produc- 

TabU 18 {Continued) 



THE Human genome . 

tion (five myelin PO, three myelin proteolip- 
id, myelin basic protein, and myelin-oligo- 
dendrocyte glycoprotein, or MOG), and pos- 
sibly more-remotely. related members of the 
MOG family. Flies have only a single myelin 
proteolipid, and worms have none at all. 



Intercellular and intracellular signaUng 
pathways in development and homeostasis^ 
Many protein families that have expanded in 
•. humans relative to the invertebrates are in- 
volved in signaling processes, particularly in 
response to development and differentiation 



• Accession 

* ncimber. 



*' Domam name 



Dornaln description. 



Y 



PF00254 

PfOl590 

Pf01344 

Pf00560 

PF00917 

Pf00989 

PF0059S 

Pf00169 

PF01535 

PF00S36 

PF01369 

PF00017 

PF00018 

PF01740 

PF00515 

PF00400 

PF00397 

PF00569 

PF01754 

PF01383 

PF01426 

PF00643 

PF00533 

PF0d439 

PF00651 

PF00145* 

PF00385 

PF00125 

PF00134 

PF00270 

PF01529 

PF00546 

PF0O250 

PF0O320 

PF01585 

PF00010 

PFOO850 

PF00046 

PF01833 

PF02373 

PF02375 

P^p0013 

?F01352 

?F00104 

?F0O412 

'F00917 

'F00249 ' 

'F02344 

'F01753 

>F00628 

'F0O157 

'F02257 

'F00076 



FKBP 

CAF . 

Kelch 

LRR** 

MATH 

PAS 

PD2 

PH 

PPR*» 

SAM 

Sec7 

SH2 

SH3 

STAS 

TPR** 

WD40** 

WW 

2Z 

2f-A20 

ARID 

BAH 

Zf-B^box** 
. BRCr . 
Bromodomaln 
BTB 

DNA^methylase 
Chromo 

Histone 

Cyclin 

DEAD 

2f-DHHC 

F-box** 

ForJehead 

GATA 

G-patch 

HLH** 

Hist^deacelyl ^ - 
Homcobox 
TIG 
JmjC 
JmjN' ' 
KH-ddmaffv - 
■ KRAB 
Hoonone^ec. • . 

UM 
MATH 

Myb.DNA-binding 

Myc-L2 

2f-MYND 

PHD 

Pou 

RFK.DNAJbinding 
Rmi ' 



'F02037 SAP 

F00622 SPRY 

F01852 START 

F00907 T-box 



FKBP-type peptidyl-prblyt cts-trans Isomerases 

CAF domain 

Kelch motif 

Leucine Rich Repeat 

MATH domain 

PAS domain 

PD2 domain (Also Ichown as DHR or CLCF) 
PH domain 
PPR repeat 

SAM domain (Sterile alpha motif) 
Sec7 domain 

Src homology 2 (SH2) domain 
Src homology 3 (SH3) domain 
STAS domain 
TPR domain 
WD40 domain 
WW domain 

22-2inc finger present In dystrophin, CBP/p300 

Nuclear Interaction domains 

'A20-like zinc finger 
ARID DNA binding domain 
BAH domain . 
B-box2inc finger 

BRCA1 C Terminus (BRCT) domain 
Bromodomain ■ • . . 

BTB/POZ domain ; . • " ■ * " • ' 

C-5 cytosine-specific DNA methylase 
chromo' (CHRromatin Organization Modifier) 
domain ' . * 

. Core histone H2A/H2B/H3/H4 
CycUn 

DEAO/OEAH box heUcase 
DHHC zinc finger domain 
F*box domain 
Fork head domain 
GATA zinc finger 
tS-patch domain 

HeUx-loop-helix DNA-binding domain 
Histone deacetylase family 
Homeobox domain 

IPT/TIG domain ■ " 

JmjC domain ... - " 
•JmjN' domain 

KH' domain - ' • " • 

KRAB box 

Ugand-binding dornaln of nuclear hormone 
receptor 

UM domain containing proteins • " 

•MATH domain jt> •* * 

M[yb-like DNA-blnding domain 
Myc leucine zipper domain 
MYND finger 
PHD-finger 

Pou domairv— N-terminal to homeobox domain 
RfX DNA-blnding domalp. - 
' RNA recognition motif (a,k.a. RiU^, RBD. or RNP 

domain) 
SAP domain 
SPRY domain 
START domain 
T-box 



1.5(20) 
7(8) 
54(157) 
25(30) 
11 

18(19) 
96 (154) 
193(212) 
5 

29(31) 
13 
87(95) 
143 (182) 
5 

72(131) 
136(305) 
32(53) 
10(11) 

2(8) 
11 
8(10) 
32(35) 
17(28) 
37(48) 
97(98) 
3(4) 
24(27) 



7(8) 
2(4) 
12(48) 
24(30) 
5 

9(10) 
60(87) 
72(78) 
3(4) 
15 
5 

33(39) 
55(75) 
1 

39(101) 
98 (226) 
24(39) 
13 

2 
6 

7(8) 
1 

10(18) 
16(22) 
62(64} 
1 

14(15) 





*. 4 


24(29) 


1 


0 


10 


13(41) 


3 


102 (178) 


7(11) 


1 


15(16) 


88(161) 


1 


61 (74) 


6 


1 


13(18) 


46(66) 


2 


5 


65 (68) 


24 


23 


0 


1 


474 (2485) 


8 


3 


6 


5 


5 


9 


44(48) 




3 


46(61) 


23(27) 


4 


6 


2 


13 


28(54) 


16(31) 


e5(124) 


72(153) 


56(121) 


167(344) 


16(24) 


5(8) 


11(15) 


10 


2 


10 



2 
4 

. 4(5) 

. 23 (35) 
'18(26) 
•86 (91) 

a. 

17(18) 



0 
2 
5 
0 

10(16} 
10(15} 

1 (2) 
0 

'\ 1(2) 



8 

21(25) 
0 

12(16) 
28 
30 (31) 
13(15) 
12 



75(81).. 


5 


71 (73) 


8 


48 


19 


10 


10 


11 


35 


63(66) 


46(50) 


55(57) 


50(52) 


84(87) 


15 


20 


16 


7 


22 


16 


• 15 


309 (324) 


9 


165(167) 


35 (36) 


20(21) 


15 


4 


0 


11(17) 


5(6) 


8(10) 


9 


26 


18 


16 


13 


4 


14(15} 


60(61) 


44 


24 


4 


39 


12 


5(6) 


: 8(10) 


5 


10 


160(178) 


100(103) 


' 82(84) 


6 


66 


29(53) 


11(13) 


5(7) 


2 


1 


10 


4 


6 


4 


7 


7 


4 


2 


3 


7 


28(67) 


14(32) 


17(46) 


4(14) 


27(61) 


204(243) 


0 


0 


0 


0 


47 


17 


142(147) 


0 . 


0 



62 (129) 


33(83) 


33(79} 


4(7) 


10(16} 


11 


.5 


88 (161) 


1 


61 (74) 


.32(43) 


18(24) 


17(24) 


15(20) 


243 (401) 


. 1 


. 0 


0 


0 


0 


. ; 14 


14 


9 


1 


7 


68 (86) 


40(53} 


32(44) 


14(15) 


96 (105) 


15 


5 


4 


0 


0 


7 


2 


1 


1 


0 


224(324) 


127(199} 


94(145) 


43(73} 


232 (369) 


15 


8 


5 


5 


-6(7} 


44(51) 


10(12) 


5(7) 


3 


6 


10* 


2 


6 


0 


23 


17(19) 


8 


22 


0 


0 
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Accession 
number 



Domafn name 



* Domain description 



H 



W 



PF02135 
PF01285 
PF021 76 
PF00352 

PF00567 
PF00642 
PF00096 
PF00097 
PF00098 



Zf-TAZ 
TEA 

2f-TRAF 
TBP 

TUDOR 

Zf-CCCH 

2f-C2H2**' 

2f-C3HC4 

2f-CCHC 



TA2 finger 

TEA domain 
• TRAF-type line finger 

Transcription factor TFIID (or TATA-blnding 
protein, TBP) 
. TUDOR domain ... 

Zinc finger. C-x8-C-;<5-C-x3.H type (and similar) 

Zinc fingeoC2H2 type , 

Zinc imger, C3HC4 type (RING finger) 
Zinc knuckle 



2(3) 
A 

6(9) 
2(4) 

.9(24). 
17(22) 
564(4500) • 
135(137) 
9(17) 



1(2J 
1 

1(3) 
4(8) 

9(19) ^ 
6(8) 
234(771) 
57 
6(10) 



6(7) 
1 
1 

2(4) 

4(5) 
22(42) 
68(155) 
88 (89) 
17(33) 



0 
• 1 
O 
1(2) 

0 

3(5) 
34(56) 
18 
7(13) 



10(15) 
0 
2 

2(4) 
2 

31(46) 
21(24) 
298 (304) 
68(91). 



(Tables 18. and 19). They include secreted 
hormones and growth factors, receptors, in- 
fracellular signaling molecules,* and transcrip- 
tion factors. 

Developmental signaling molecules that are 
enriched in the human genome include growth 
factors such as wit, tnmsforaiing growth fac- 
tor-p (TGF-p), fibroblast growth factor (FGF), 
nerve growth factor, platelet derived growth 
factor (PDGF), and ephrins. These growth fac- 
tors affect tissue differentiation and a wide 
range of cellular processes involving actin-cy- 
toskeletal and nuclear regulation. The corre- 
sponding receptors of these developmental li- 
gands are also expanded in humans. For exam- 
ple, our analysis sagg^sis at least 8 human 
ephrin genes (2 in the fly, 4 in the woraj) and .1 2 
cphrin receptors (2 in the fjy, 1 in the womi). In 
the'wnt signaling pathway, we find 18 wnt 
family genes (6 in the fly, 5 in the womi) and 
12 fiizzled receptors (6 in the fly, 5 in the 
worm). The Groucho family of transcriptional 
corcpressors downstream in the vwit pathway 
are even more markedly expanded, with 13 
predicted members in humans (2 in the fly, 1 in 
the worm). _ _ 

Extracellular adhesion molecules involved 
in signaling are expanded in the human genome 
(Tables 18 and 19). The interactions of several 
of these adhesion domains with extracellular 
matrix proteoglycans play a critical role in host 
defense, rpoiphogenesis, and::^ssue repair 
(757). Consistenrwith the welf-defmed role of 
heparan sulfate., prpteoglycans in modulating " 
these interactions' {132), we obserye an expan- 
. sion of the heparin sulfate sulfotransferases in 
the human genome relative t^ worm and fly. 
These sulfotransferases modulate tissue differ- 
entiation (7Ji). A similar expansion in humans 
is noted in istmotural proteins that constitute the 
actin-<ytoskeIetal architecture. Compared with 
the fly and worm, we observe an explosive 
expansion of the nebulin (35 domains per pro- 
tein on average), aggrecan (12 domains per 
protein on average), and plectin (5 domains per 
protem on average) repeals in humans. These 
repeats are present in proteins involved in mod- 
ulating the actin-cytoskeleton with predominant 
expression in neuronal, muscle, and vascular 
tissues. 



- Comparison across the.five sequenced eu- 
kaiyptic orgam'sms revealed several expand- 
• ed protein families and domains involved in 
c>rtoplasmic signal transduction (Table 18). 
In particular, signal transduction pathways 
playing roles in developmental regulation and 
acquired immunity were substantially en- 
riched. There is a factor of 2 or greater ex- 
pansion in humans in the Ras superfamily 
GTPases and the .GTPase activator and GTP 
exchange factors associated with them. Al- 
■ . though there are about the same number of 
tyrosine kinases in the humWand C. etegans 
genomes, in humans there is an increase in 
. the SH2, PTB, and ITAM domains involved 
' - in phosphotyrosine signal transduction. Fur- . 
ther, there is a twofold expansion of phos- 
phodiesterases injhe human genome, com- 
pared with either the worm or fly genomes. 
: The downstream effectors of the iiilracellu- 
lar signaling molecules include the transcription 
. factors that transduce developmental fates. Sig- 
nificant expansions are noted in the ligand- 
. binding nuclear homione receptor class of tran- • 
. scription factors compared with the fly genome^' 
although not to the extent observed in the worm 
(Tables 18 and 19). Perhaps the most striking 
expansion in humans is in the C2H2 zinc finger 
transcription factors. Pfam detects a total of 
4500 C2H2 zinc finger domains in 564 human 
proteins; Compared with 771 in 234 fly proteins. 
This means that, there has been a dramatic 
expansion not only^ in the number of C2H2 
transcription factors, but also in the number of 
these DNA-binding . motifs per transcription 
factor (8 on average in humaiis, 3.3 on average 
in the fly, aftd 2.3 on average^ in the worm).* 
Furthemiore, many of these toanscripdon fac- 
tors contain either the KRAB, or SCAN do- 
•'. mains, which are not found in the fly or worm 
-genomes. These domains are involved in the 
ohgomerization of transcription factors and in- 
crease the combinatorial partnering of these 
factors. In general, most of the transcription 
factor domains axe shared between the three 
animal genomes, but the rcassortmehf of these 
domains results in organism-specific transcrip- 
tion factor famiUcs. The domain combinations 
found in the human, fly, and worm include the 
BTB with C2H2 in the fly and humans, and 



homeodomains alone or in combination with 
Pou and LIM domains in all of the animal 
genomes. In plants, however, a different set of 
transcription factors are expanded, namely, the 
myb family, and a unique set that includes VPl 
and AP2 domaio-^ntainiiig proteins {134), 
The yeast genome has a paucity of transcription 
factors, compared with the multicellular eu- 
karyotes, and its repertoire is limited to the 
' expansion ofthe yeast-specific C6 transcription 
factor family involved in metabolic regulation. 

While we have illustrated expansions in a 
subset of signal transduction molecules in the 
human genome compared with the other eu- 
karyotic genomes, it should be noted that 
most ofthe protein domains are highly con- 
served. An interesting observation. "is that ' 
;• worms and Hiimans have approximately the 
same number of both tyrosine kinases and 
serine/threonine kinases (Table 19). It is im- 
portant to note, however, that these are mere- 
ly counts of the catalytic domain; the proteins 
that contain these domams also display a 
wide repertoire of interaction domains with 
• significant combinatorial diversity. 

Hemostasis. Hemostasis is regulated pri- 
manly by plasma proteases ofthe coagulation 
pathway and by the interactions that occur be- 
tween the vascular endothelium and platelets. 
Consistent with known anatomical and physio- 
logical differences between vertebrates and in- 
vertebrates, extracellular adhesion domains that 
constitute proteins integral ,to hemostasis are 
expanded in the human relative to the fly and 
womi (Tables 18 and 19). We note the evolu- 
tion of domains such as 'FIMAC, FNl, FN2, 
and Clq that mediate surface interactions be- 
tween hematopoeitic cells and the vascular ma- 
trix. In addition, there, has been extensive re- . ■ 
cruitment of more-ancient aiiimal-specific 'do- ■ 
mains such as VWA, VWC, VWD, kringle, 
and FN3 into multidomain proteins tiiat are 
involved in hemostatic regulatioa Although we 
do not find a large expansion in the total num- 
ber of serine proteases, this enzymatic domain 
has been specifically recruited into several of 
these multidomain proteins for proteolytic reg- 
ulation in the vascuJar compartment These are 
represented in plasma proteins that belong to 
the kinin and complement pathways. There is a 
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significant expansion in two families of matrix 
mctaUoproteases: ADAM (a disintegrin and 
meteUoprotease) and MMPs (matrix metaUo- 
■ f^t'^") (Table 19). Proteolysis of extracel- 
lular inatnx(ECM) proteins is critical for tissue 
development and for tissue de^rada^on in dis- 
eases such as cancer, aithritis, Ahheimer's dis- 
(S «^ of inflammatcior conditions 

. iI35 J36). ADAMs are a famUy of integral 

^braneproteins.witiiapivotalroleinfibrin- 
■ . ogenolysis and- moAilating interactions -be--- 
• tweeo hematopoietic components: aid the ' 
vascularmatrix components. These protelni ' 
have been shown, to cleave matrix proteins 
and even signaling molecules: ADAM-17 

ADAM-IO has been implicated in the Notch 
fi^^^P^^'^^^yOSS). We have identified 
T "^"^ metalloprotease 

I^Il'. f members of the 

ADAM and ADAM-TS families 

Apoptosis. Evolutionary conservation of 
some of the apoptotic pathway components 
across eukarya is consistent vdth its central 
role m developmental regulation and as a 
response to pathogens and stress signals. TTie 
signal transduction pathways involved in pro- 
granuned cell death, or apoptosis. are medi- 
ated by interactions between well-character- 
ized domains that include extracellular do- 
mains, adaptor (protein-protein interaction) 
domains, and those found in effector and" 
regulatory enzymes (137). We enumerated 
the protein counts of central adaptor and ef- - 
feclor enzyme domains that are found only in' 
the apoptotic pathways to provide an estimate 
of divergence across eukarya and relative 
expansion m the human genome when com- 
pared with the fly and worm (Table 18) 
Adaptor domains found in proteins restricted 
only to apoptotic regulation such as the DED 
doniains arc vertebrate-specific, whereas oth- 
d .-n^K T '"^ "Ptesent- 

r ^.f "^""^ (although thS number 
)rBcI2 family members in humans is signif- 
cantly expanded). Although plaAb and ^ast 
ack the caspases. caspase-like molecules, 
•^ely the para- and meta-caspases. hayp 
een reported in these drganisms (I38):Com- 
ared with other animal genomes, the human 
enomc shows an .cxpansioa in' the adaptor 
Id effector domain-c0i,tabiiig proteins in- 
Jlved m apoptosis. as well as in the-pro'- 
ases mvolvcd in the cascade such as the 
ispase and calpain families. 
Expansions of other protein families. 

n^/^^'""- ^■ewer cyto- - 

rome P450 genes in humans than in either - 
: fly or worm. Lipoxygenases (sbc in hu- 

"^I'^^J^ "^P^^ ^ specific 

the veitebratrfs and plants, whereas the lip- 
/genase-acbvating proteins (four in human^ 
y be vertebrate-specific. Lipoxygenases are 
olved m arachidonic acid metabolism, and 
y and their activators have been implicated 



THE HUMAN.CENOME 
in diverse human pathology ranging from - 
allergic responses to cancers. One of the most 
surpnsug human expansions; however, is in 
■the number, of glyceraldehyde-3-phosphate • 
dehydrogenase (OAPDH) genes (46 in hu- 
mans. 3 in the fly. and 4 in the worm). There 
IS, however, evidence for many retrotrans- 



posed GAPDH pseudogenes (139) which 
may account for this apparent ex^aS"^' 
However, it ,s interestmg that GAPDH. lon^ 
known as^a conserved enzyme involved 

ba«! l^'"-" phyla from 

to r »° h'^ans. has recently been showS 
to have other functions. It has a second c^ 



\V- Panther famny/subfamHy^' ^/ • ' _ - . 



Ependymin 
Ion channels 
AcetytchoUne receptor 

AmUorfdef-sensitrve/deffenerin 
CNG/EAG 
IRK 

ftP/iyanodine 
Ne urotransmitterrgated 
P2X purinoceptor 
TASK 

Transient receptor 
Voltage-gated Ca'* alpha 
Voltage-gated Ca'+ aIpha-2 • 
Voltage-gated Ca'^- beta 
Voltage-gated Ca^* gamma 
VoUagc-gated K* alpha 
Voltage-gated KQT 
Voltage-gated Na* 
Myelin basic protein 
.-Myelin PO . 
; Myelin proteoUpId 

Myelin-oligodendrocyte glycoprotein 

.Neuropllin..* 
-Plexin ' • ■ ... 
• Semaphorin 
Synaptotagmin 



Defensin • " 

CytoWnef 
. CCSF 
CMCSF 

Intercnne alpha 
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alytic activity, as a xiracil DNA glycosylase 
{140) and functions as a cell cycle regulator 
[14 2) and has even beegi implicated in apo- 
ptosis(7-/2). 

Translation, Another striking set of hu- 
man expansions has occurred in certain fam- 
ines involved in the translational machinery. 
We identified 28 different ribosomal subunits 
.that each have at least 10 copies in the ge- 
nome; on average, for all ribosomal proteins . 
there is about an 8- to 10-fold expansion in 
the mmiber of genes relative to either the 
wotm or fly. Retrotransposed pseudogenes 

\. • ■ . • 
Table 19 (Co/jt//)tie</) 



THE HUMAN GENOME 

; may. account for many .of these expansions 
.[see the discussion above and {143)]. Recent 
evidence suggests that a number of ribosomal 
proteins have secondaxy functions indepen- 
dent of their involvement in protein biosyn- 
thesis; for example, L13a and the related L7 
subunits (36 copies in humans) have. been 
shown to induce apoptpsis {144), 

i There is also a four- to fivefold expansion 
in the elongation factor 1-alpha family 
.(eEFl A; 56 human genes). Many of these 
expansions likely represent intronless para- 
logs that have presumably arisen from retro- 
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• transposition, and again there is evidence that 
^ many of these may be pseudogenes {14S). 
However, a second form (eEFIA2) of this 
factor has been identied with tissue-specific 
expression in skeletal muscle and a comple- 
mentary expression pattern to the ubiquitous- 
ly expressed eEFl A {146), 
: Ribonucleoproteins. ;AlXtm2i^w& splicing 
.. results in .multiple transcripts from a single 
gene, and can therefore generate additional 
diversity in an organism's protein comple- 
ment. We have identified 269 genes for ri- 
bonucleoproteins. This represents over 2.5 
times.the number of ribonucleoprotein genes 
in the worm, two times that of the fly, and 
about the same as the 265 identified in the 
Arabidopsis genome. Whether the diversity 
of ribonucleoprotein genes in humans con- 
tributes to gene regulation at either the splic- 
ing or translational level is imlmowa. 

Posttranslational modijications. In this 
set of processes, the most prominent expan- 
sion is the transglutaminases, calcium-depen- 
dent en:q,Tnes that catalyze the cross-linking 
of proteins in cellular processes such as he- 
mostasis and apoptosis {147), The vitamin 
K-dependent gamma carboxylase gene prod- 
uct acts on the GLA domain (missing in the 
fly and worm) found in coagulation factors, 
osteocalcin, and matrix GLA protein {14S^. 
Tyrosylprotein. sulfotransferases participate , 
in the posttranslational modification of pro- 
teins involved in. inflammation and hembsta- 
sis, including coagulation factors and chemo- 
Idne receptors {149). Although there is no 
significant numerical increase in the counts 
for domains involved in nuclear protein mod- 
ification, there are a number of domain ar- 
rangements in the predicted human proteins 
that are not found in the other currently se- 
quenced genomes- These include the tandem 
.association of two histone deacetylase do- 
mains in HD6 with a ubiquitin fmger domain, 
a feature lacking in the fly genome. An ad- 
ditional example is the co-occurrence of im- 
portant nuclear regulatory enzyme PARP 
(poly-ADP ribpsyl transferase) domain fused 
to protem-interaction domains— BRCT and 
VWA in humans. ■ 

Concluding remarks. There are several 
possible explanations fof the differences in 
phenotypic complexity observed in humans 
when compared to the fly and wo.rm. Some of 
these relate to the. prominent differences in 
the immune ' system, hemostasis, neuronal, 
vascular, and cytoskeletal complexity. The 
finding that the htunan genome contains few- 
er genes than previously predicted might be 
compensated for by combinatorial diversity 
generated at the levels of protein architecture, 
transcriptional and translational control, post- 
translational modification of proteins, or 
posttranscriptional regulation. Extensive do- 
main shuffling to increase or alter combina- 
torial diversity can provide an exponential 
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increase in the ability to mediate protein- 
protein interactions without dramatically in- 
creasing the absolute size of the protein com- 

n^r^ j P^^P^ctive of sequence analysis) 
protem domams and increasin/ Jeeulatory 
complexity by domain accretion both quanti- 
tatively and qualitatively (recniitment of nov- 
cl domams with preexisting ones) are two 

ZT'. .^^^-^^.^^^^^e » humans-. Perhaps " 
. . the best.illustralion of this trend is ihe ClHl' 
zmc fingeMonlaining-transcriplion factors ' 
. where we.see expansion in the number of 
domams . per protein, together, with verte- 
cJ^lxf ^""'^'^ '^'''"'^ ^^^^ ^ KRAB and 
nn^* '^^"""^ prominent use 

of mtemal nbosomal entry sites m the human 

ll^lZV"" translation of specific 

Pfoteins suggests that this is an area 
that needs fiirther research to identify'the full 

(Tsn llS'' ^ human genome 

(J^J), At theposttranslational level, although 
wc pro^de examples of expansions of soine • 

Hnnc /^'" "'^^^^"^ '"^^Jiflca. 
tions further experimental evidence is re- 
qu^ed to evaluate whether this is correlated 
with mcreased complexity in protein process- 
mg. Posttranscriptional processmg and the' 
extent of isoform gcneratioii & the Jiuman 
rcmam to be cataloged in their entirely. Given 
2ll?''T^^ nature of the spliceosomal ma- 
chinery, further analysis will be required to' 
dissect regulation at this level. 
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Leucine zipper 
Nuclear hormone receptort 
Pou-related 
Runt-related 



8 Conclusions 

8.1 The whole-genome sequencing 
approach versus BAC by BAC 

slTot^nT ^^^^^^ >vhole-genome 
shotgun sequencmg approach to a diverse 
group of organisms with a wide range of 
genome sizes and repeat content allows us to 
assess Its strengths and weaknesses. With the 
mccess of the method for a large nGniber of 
in"? 1^'"'''"?' /Jz-^^cv^/rf/a, and now the 
lumaii, tiiere can be no doubt tonceming the 
tjhty of this method. 77.e large number of 
genomes that have been sequenced 
Te^.h^'^'^'l^^-^' demonstAte^tKat 
mrVT'^ll"^ genomes.can be sequenced 
fficiently without any input other that the de 
Dvo mate-paired sequ<jrieesV Witli more 
)mplex genomes like those of DrosophHd or 
iman map mformatioh, in the form of well- 

tV^tf'^^'^'^'^ ^"^^^^l ^or'ling. 
Qge ordenng of scaffolds. For joining scaf- 
ds into chromosomes, the quality of the • 
»P (m terms ofthe order ofthe markers) is - 

In ; 'napping could have 

-n performed concurrently with sequenc 
; P"^<^i"stcncc of mapping data was 
icfical. During the sequencmg of the A 
itana genome, sequencing of individual 
C clones pcnnittcd extension of the sc- 
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quence well into centromeric regions. and al-. 
lowed high-quality resolution of complex re- • 
peat regions. Likewise, in Drosqphila, the 
BAG physical map was most useful in re- 
gions near* the highly repetitive centromeres 
and telomeres. WGA has been found to de- 
. liver excellent-quality reconstructions of the • - 
imique regions of the genome. As the genome.. . 
size, and more importantly the repetitive con- 
tent, increases, the WGA approach delivers 
less of the repetitive sequence. • 

The cost and overall efficiency of clone-by- 
clone approaches makes them difficult to justify 
as a stand-^one strategy for future large-scale 
genome-sequencing projects. Specific* appUca- 
tions of BAC-based or other clone mapping and 
sequencing strategies to resolve ambiguities in . 
sequence assembly that cannot be efiicienay.. 



- — ,j ^ cmcienuy .. more than 30,000 genes f/Ji) An estimate of 
^LV^'^T^'T'^''^ approaches alone,, 30.000 gene loci for hum^anfw^^s^^^^ 
are dearly worth exploring. Hybrid annmarh^ K« n^^.,, v: , . . , ™vea 



are dearly worth exploring. Hybrid approaches 
to whole-genome sfcquencing will only work if 
there is sufficient coverage in both the whole- 
genome shotgun phase and the BAG clone se- 
quencing phase.. Our experience with human 
genome assembly suggests that this will require 
at least 3X coverage of both whole-genome and 
BAG shotgim sequence data. 

6.2 The low gene number in humans 



The Human genome 

predicting genes should limit this number. As 
was true at the beginning of genome sequenc- 
ing, ultimately it will be necessary to measure 
mRNA in specific cell types to demonstrate 
the presence of a gene. • • 

.J. B. S. Haldane speculated in 1937 that a 
population of organisms might.have to pay a 
.= price for .the number of genes it can possibly 
. cany, He. theorized that when the number of 
.genes becomes too .large, each zygote cairies 
so many new deleterious mutations that the 
population simply cannot. maintam itself. On 
the basis of this premise, and on the basis of 
available mutation rates and x-ray-induced 
mutations at specific loci, .Muller, in 1967 ' 
(75^), . calculated that the. mammalian ge--. 
nome would contain a maximum of not much 
more than 30,000 genes (J5S), An estimate of 



Mr^ ^ , KnocKout mutations lead to ahn 

We have sequenced and assembled -95% of. . cemible phenotypic perturbations 
the cuchrpmatic seouence of// x/7mi.«c or,^ . tu^ 1^^... 



' ■ . CLX It VC< 

at by Crow and Kimura (IS6), Muller*s esti- 
mate forZ). melanogaster was 10,000 genes 
. compared to. 13,000 derived'byannotarion of 
the fly genome (26, 27). These arguments for . 
the theoretical maximum gene number were 
. based on simplified ideas of genetic load- 
that all genes have a certab low rate of 
mutation to a deleterious state. However, it is 
clear that many mouse, fly, womi, and yeast 
knockout mutations lead to almost no dis- 



the cuchromatic sequence of J/, sapiens and 
used a new automated gene prediction meth- 
od to" produce a preliminaiy catalog of the 
human genes. This has provided a major sur- 
prise: V/e have found far fewer genes (26,000 
to 38,000) thau the earlier molecular pre- 
dictions (50,000 to over 140,000). Whatever 
the reasons for this current disparity, only 
detailed annotation, comparative genomics 
(particularly, using the Mus musculus ge- 
nome), and carefiil molecular dissection of 
complex phenotypes will clarify this critical 
issue of the basic "parts list" of our genome. 
Certainly, the analysis is still incomplete and 
considerable refinement will occur io. .the 
years to c6me as the precise sbucturc of each 
transcription unit is evaluatdd: A good place 
to start is to deternliric why the gene csti- • 
mates derived- jrom EST data are so discor- 
dant with our predictions. It is nicely that the 
following contribute to an inflated gene num- 
ber derived from ESTs: the variable lengths 
of 3'- and 5 '-untranslated leaders and trailers; 
. the little-understood vagaries of RNA pro- 
cessing that often leave intronic regions in an 
unspliced condition; the finding' that nearly 
40% of human 'genes are alternatively spliced 
il53)i and finally, the unsolved technical 
problems in EST library construction where 
contammation from heterogeneous nuclear 
RNA and genomic DNA are not uncommon. 
Of course, it is possible that there are genes 
that remain unpredicted owing to the absence 
of EST or protcm data to support them, al- 
though our use of mouse genome data for 



The modest number of human genes 
means that we must look elsewhere for the 
•mechanisms that generate the; complexities 
.inhereiit in human development and. the so-, 
phisticated signaling systems that maintam 
homeostasis. There are a large number of 
ways in which the functions of individual 
genes and gene products are regulated. The 
degree of "openness" of chromatin structure." 
and hence transcriptional activity is regulated ^ 
by protein complexes that involve histone 
and DNA enzymatic modifications. We enu- 
merate many of the protems that arc likely 
involved in nuclear regulation in Table 19. 
The locatiaii, .timiilg, and quantity of tran- 
scription are intimately linked to nuclear sig- 
nal transduction events as well as by the 
tissue-specific expression of many of these 
proteins. Equally important are regulatory 
DNA elements -that include insulators, re- . 
peats, and endogenous viruses {157); meth- - - 
ylation of CpG islands in imprinting (7J<S); * 
and promoter-enhancer and .intronic regions' 
• ; that modulate transcription.* The spliceosomal 
. machinery consists of multisubunit proteins 
(Table 19) as well as structural and catalytic 
' RNA Clements {159) that regulate transcript 
structure through alternative start and termi- 
nation sites and splicing. Hence, there is a 
need to study different classes of iWA mol- 
ecules {260) such as small nucleolar RNAs, 
antisense riboregulator RNA, RNA involved 
in X-dosage compensation, and other struc- 
tural RNAs to appreciate their precise role in 
regulating gene expression. The phenomenon 



of RNA editmg m which coding changes 
occur directly at the level of mlU^JA is of 
clmical and biological relevance {161) Final- 
ly, examples of translational control include 
mtemal nbosomal entry sites that are found 
m protems involved in cell cycle regulation 
and apoptosis (/^2). At the protein level 
mmor alterations in the .nature of protein-' ' 
protem mteractions, protein • modifications, 
and localization can have dramatic effects on 
cellular physiology {263). This dynamic sys- 
tem therefore has many ways to modulate 
activity, which suggests that defmition of 

fo.niplex systems by analysis of single genes . 
IS unlikely to be entirely successful. 

.. In situ studies have shown that the human 
genome is asymmetrically populated with 
: G+C content, CpG islands, and genes {68) 
However, the genes are not distributed quite 
as unequaUy as had been predicted (Table 9) 
{69), The most G+C-rich fiaction of the ge- 
nome, H3 isochores, constitute more of the 
genome than previously thought (about 9%) 
and are the most gene-dense fraction, but 
contam only 25% of the genes, rather than the 
predicted --40%. The low G+C L isochores 
make up 65% of the genome, and 48% of the 
genes. This inhomogeneit/, the net result of 
millions of years of mammalian gene dupli- 
cation, has been described as the "desertifi- 
cation" of the vertebrate genome (77). Why 
. are there clustered regions of high and low^ 
gene density, and are these accidents of his- 

• tory or driven by selection and evolution? If 

• these deserts are dispensable, it ought to be 
possible to fmd mammalian genomes that are 
far smaller in size than the human genome. 
Indeed, many species of bats have genome 
sizes that are much .smaller than that of hu- 
mans; for example, Mimopterus, a species of 
.Italian bat, has a genome size that is only 
50% that of humans {164). Similarly, Mun- 
tiatus, a species of Asian barking deer, has a 
genome size that is -70% that of humans. 



8.3 Human DNA sequence variation 
and its distribution across the genome 
This is the first cukaiyotic genome in which a 
nearly uniform ascertainment of polymoiphism 
..has been completed. Although we have identi- 
- fied and mapped more than 3 million SNPs, this 
by no means impUes that the task of finding and 
cataloging SNPs is complete. These represent 
only a Section of the SNPs present in the 
human population as a whole. Nevertheless, 
this first glimpse at genome-wide variation has 
revealed strong inhomogeneitics m the distribu- 
tion of SNPs across the genome. Polymorphism 
in DNA cam'es with it a snapshot of the past 
operation of population genetic forces, includ- 
ing mutation, migration, selection, and genetic 
drifi The availabihty of a dense anay of SNPs 
will allow questions related to each of these 
factors to be addressed on a genome-wide basis. 
SNP studies can establish the range of haplo- 
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types p^resent in subjects of different elhnogeo- 
graphic origins, providing insights into popula- 
fcion history and migration patterns. Although 
such studies have suggested that modem human 
hneages derive from Africa, many important 
questions regarding human origins remain un- 
answered, and more analyses usmg detailed 
SNP maps wiU be needed to settle these con- 
trovetsies. In addition to providing evidence for 
population expansions, migration,;and admix-, 
toe, SNPs can serve as markers for the extent 
of eyolutionaty constraint acting on particiflar 
. genes. The .conrelation between patterns of in- 
• traspecies and interspecies genetic variation 
n^y prove to be especially infomiative to iden- 
tify sites of reduced genetic diversity that may 
mark loci where sequence variations are not 
tolerated. 

The remarkable heterogeneity in SNP 
density implies that there are a variety of 
forces acting on polymorphism— sparse re- 
gions may have lower SNP density because 
the mutation rate is lower, because most of 
those regions have a lower fraction of muta- 
tions that are tolerated, or because recent 
strong selection in favor of a newly arisen 
allele "swept" the linked variation out of the 
population (J6S). The effect f)f random ge- 
netic drift also varies widely across the ge- 
nome. The nonrecombining portfon of the Y 
chromosome faces the strongest pressure 
from random drift because there are roughly 
one-quarter as many Y chi-omosomes in 'the : 
population as there are autosomal chromo-*" 
somes, and the level of polymorphism on the 
Y IS correspondingly less. Similarly, the X 
chromosome has a smaller effective popti- 
lation size than the autosomes, and its nu- 
cleotide diversity is also reduced. But even 
across a single autosome, the effective pop- 
ulation size can vary because the density of 
deleterious mutations may vary. Regions of 
high density of deleterious mutations will 
see a greater rate of climinatTon by selec- 
tion, and the effective population size will 
be smaller As a result, the density of 

even completely neutral SNPs will be lower 
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then docks on this, and then the complex 
moves there.. . to the exciting area 
of network perturbations, nonlinear re-, 
sponses and thresholds, and their pivotal 
role in human diseases. 

The enumeration of other '^arts lists'* re- 
veals that in organisms with complex nervous 
systems, neither gene number, neuron number 



8.5 Beyond single components 
While few would disagree with the intuitive 
conclusion that Einstein's brab was more 
cornplex than that oFDrosophila, closer com- 
pansons such as whether the set of predicted 
human proteins is more complex thari the 
Protein set of Drosophila. and if so, to what 
nor number of "^ceU ^pes'7oa;Ta;;;'r^y- ^'^'.''^^ 
meaningful manher vviSTvea simpb^c mT S T""' orprotem-protein interaction 
sures -of structural. or behaWoS^^SeVlt^^^ • -^r^'"^/^' "^'^^ context-dependent" 
Nor w^^ : 



ofnonlincanti.es arid cpigenesis The 520 
/ milhon neurons ofthe common octopus exceed 
the neuronal number in the brain of a mouse by 
an order of magnitude. It is apparent from a 
comparison of genomic data on fee rnouse and 
.human, and from comparadve mammaUan neu- 
.•roanatomy (169% that the morphological and 
• behavioral diversity found in mammals is un- 
derpinned by a similar gene repertoire and sim- 
ilar neuroanatomies. For example, when one 
compares a pygmy mannosct (which is only 4 
inches tall and weighs about 6 ounces) to a 
chimpanzee, the brain volume of this minute 
primate is found to be only about 1.5 cm^, two 
orders of magnitude less than that of a chimp 
and three orders less than that of humans. Yet 
the neuroanatomies of all three brains are strik- 
ingly similar, and the behavioral characteristics 
ofthe pygmy mannojsei are little different from 
those of chimpanzees. Behveen humans and 
chipipanzees, the gene number, gene structures 
and .flmctions, chromosomal and genornic or- 
• ganizatiohs, and cell types and neuroanatomies 
.- are almost indistinguishable, yet the develop- 
rhentai modifications, that predisjxsed human 
lineages tp cortical expansion and deyelopmerit 
ofthe laiyax, giving rise to language, culminat- ' 
ed in a massive singularity that by even the 
simplest of criteria made humans more com- 
plex in a behavioral sense. 

■ Simple examination of the number of neu- 
rons, cell types, or genes or ofthe genome 
size does hot alone account for the differenc- 
es in complexity that we observe. Rather, it is 
the interactions vtithin and among these' sets 



«,.^t. • p^, . "*"wv.*vrTwi liitcfciciions wi nun ana amone these set« 



on the association between SNP /density 
ajid local recombination rates in Drosoph-- 
Ha, and it remains aa important task to 
assess the strengt]ti of;this association in the 
human genome, because of its impact oh 
the design of local SNP densities for dis- 
^ase-association studies. It also'r6mains an 
important task to validate SNPs on a 
genomic scale in order to assess the degree 
)f heterogeneity among geographic and 
:thnic populations- 

1.4 Genome coniplexiiy ' ' 
V^e will soon be In a position to move away 
rom the cataloging of individual compo- 
ents of the system, and beyond the sim- 
listic notions of "this binds to that, which 



it is possible that there are "special cases" of 
regulatory gene networks that have a dispro- 
. portionate effect on the overall system. We 
have presented several examples of "regula- 
tory genes" that are significantly increased in 
the human genogi^xompared with the fly arid 
worm. These include extracellular* hgands 
and their cognate receptors (e.g., wnt; friz- * 
zled, TGF-p, ephrins, and connexins), as well 
as* nuclear regulators (e.g., the KRAB and 
horneo^Iomain transcription factor families), 
where a few proteins control broad develop- 
mental processes. The answers to these 
"complexities" perhaps lie in these expanded 
gene families and differeiiccs in the regulato- 
ry control of ancient genes, proteins, path- 
ways, and cells. 



, Currently,. there are more than 30 different 
mathematical descriptions of complexity "(77^5 
However, we have yet to understand the math- 
ematical dependency relating the number of 
genes with organism complexity. One pragmat- 
. tc. approach to the analysis of biological sys- 
: tems, which are composed of nonidentical ele- 
ments.(proteins, protein complexes, interacting 
. cell types, and interacting neuronal popula- 
tons), is through graph theory (I7J). The ele- 
merits of the system can be represented by the 
vertices of complex topographies, with the edg- 
es, representing the interactions belvvecn them. 
Examination of large networks reveals that they 
can self-organize, but more important they can 
be particularly robust This robustness is not 
due to redundancy, but is a property of inho- 
mogeneously wired networks. The error toler- 
ance of such networks comes with a price; they 
^ are vulnerable to the selection or removal of a 
few nodes that contribute disproportionately to' 
network stability. Gene, knockouts provide an " 
illustration: Spme.kiockputs may have minor 
effects, whereas others have catastrophic effects 
on the system. In the case pf vimentin, a sup- 
.. posedly critical.component of the cytoplasmic 
intennediate filament network of mammals, the 
knockout ofthe gene in mice reveals them to be 
reproductively normal, with no obvious pheno- 
• typic effects (772), and yet the usually conspic- 
uous vimentin network is completely absent 
On the other hand, -30% of knockouts in 
Drosophila and mice correspond to critical 
nodes whose reduction in gene product, or total 
elimination, causes the network to crash most 
of the time, although even in some of these 
cases, phenotypic normalcy ensues, given the 
appropriate genetic background Thus, there are 
no "good" genes or "bad" genes, but only net- 
works that exist at various levels and at differ- 
ent connectivities, and at different states of 
^ensitivily to perturbation. Sophisticated math- 
eriatical analysis needs to be constantly evalu- 
ated against hard biological data sets that spe- 
cifically address network dynamics. Nowhere is 
this more critical than in attempts to come to 
grips with "complexity,*.* particularly because 
deconvoluting and correcting complex net- 
works that have undergone perturbation, and 
have resulted in human diseases, is the greatest 
significant challenge now facing us. 

It has been predicted for the last 15 years 
that complete sequencing of the human ge- 
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. nome would open up new strategies for hu- 
man biological research and would have a 
inajor impact on medicine, and^through med- 
icine and pubHc health, on society. Effects on 
biomedical research are already being felt 
This assembly of the human genome se- 
quence is but a firs^ hesitant step on a long 
. and exciting journey toward understanding 
- the role of the genome in human biology It : 
•has been possible only because of inhova- 
toons in instrumentation and. software that 
have allowed automation of ahnost eveiy step " 
of the process from DNA preparation to an- - 
. notetion. The next steps are clear: We must '• ' 
. define the complexity that ensues when this 
relatively modest set of about 30,000 genes is 
expressed. The sequence provides the frame- 
work upon which all the genetics, biochem- 
istiy. physiology, and ultimately phenotype 
depend. It provides the boundaries for scien- 
tific mquiry. The sequence is only the first 
level of understanding of the genome. All 
genes and their control elements must be 
Identified; their frmctions, in concert as well 
as in isolation, defuied; theirsequence varia- 
tion worldwide described;- and the relation 
Detwcen genome variation and specific phe- 
notypic characteristics determined Now we 
Know what we have to explain. 
. Another paramount challenge awaits*" 
pubbc discussion of this infomiation and.Jts* ' 
potential for improvement of personal hedth . 
Many diverse sources of data have shown 

that any two individuals are more than 99.9% 
Identical in sequence, which means that all 
the glonous differences among individuals in 
our species that can be attributed to genes 
falls in a mere 0.1% of the sequence. There 
are two fallacies to be avoided: determim'sm, 
the Idea that all characteristics of the person 
are hard-wired" by the genome-'and reduc- 
tionism, the view that with complete knowl- 
edge of the human genome' sequence, it is 
only a matter of time before our understand- 
ing of gene functions and interactions will 
provide a complete causal descriptioVof hu- * ' 
maji vanability; The real challenge of humaa / * 
biology, beyond the taskx)f fmding out how 
;enes orchestrate tbq construction and main- • 
enance of the miraculous mechanism- of our 
>odies, wilMie ahead as wc seek to explain 
;ow our .minds have .come to 'organize 
noughts sufficiently well to investigate our 
wn existence. . 
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containing on y A or B domains, so long a? A-B- 

Ui^in?A „?» ? " ''"ile^omain proteins eon- 
^ of ,t domains. A second Ihleresting prop- 

produce a similariy matrix for the proteome as a 
and vT~ ■ ' '"^ P"'*'" '"^y- error-prone 
nmSLVk 'f ''"''* «!tJ>er sequence have slg- 
defined stmiUnty to each other, only that they 
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^ shf e at least one significant BIAST Wt \n common. 
This Is an especially Interesting property of the 
metric, because ft aUows the rapid recoveiy of pro- 
tein families from the proteome for which no mul- 
tiple alignment Is possible, thus provfding a compu- 
tational basis for the extension of protein homology 
searches beyond those of current HMM- and profile- 
based search methods. Once the whole-proleome 
similarity matrix has been calculated, Lek first par- 
titions the proteome into single-linkage clusters 
(27) on the basis of one or more shared BLAST hits 
. . between two sequences. Next, these single-linkage 
dusters are further partitioned into subclosters. 
each member of which shares a user-specified pair- • 
wise similarity with the other members of the clus- . 
.ter, as described above. For the purposes of this . 
publication, we have focused on the analysis of 
single-linkage dusters and what we have temied 
"complete dusters;' e.g, those subclusters .for 
which* eveormember has a similarity metric of 1 to 
eveiy other member of the subcluster*. We'betieve 
that the single-Unkage and complete clusters are of 
special Interest, In part, because they allow us to 
estimate and to compare sizes of core protein sets 
In a rigorous manner. The rationale for this Is as 
follows: If one Imagines for a moment a perfect 
dusterfng algorithm capable of perfectly partition- - 
Ing one or more perfectly annotated protein sets 
Into protein families. It Is reasonable to assume that 
the number of clusters will always be greater than. 
Of equal to, the number of single-Unkage clusters' 
because single-Unkage dustering Is a maximally ag- 
glomerative dustering method. Thus, If there exists 
a single protein In the predicted protein set contain- . 
Ing domains A and B, then If will be clustered by 
single Unkage together with all singlr-cJomain pro- . 
teins containing domains A or B. Likewise, for a 
predicted protein set containing a single multido- 
main protein, the number of real clusters must " 
always be less than or equal to the number ^tof - 
complete clusters, because It Is Impossible to place . • 
a unique multidomain protein Into a complete clus*-' 
ter. Thus, the single-Unkage and complete dustera 
. plus singletons should comprise a lower arid upper 
. bound of sizes of core protein sets, respeaively, 
allowing us to compare the relative size and com- ■ 
plexity of different organisms* predicted protein set 
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anrangements. Thus, the probabiUty of chance occur- 
rence Is UN^\ Allowing for both sets of genes (e g 
ABC and A'B'C) to be spread across / positioni 
Increases this to LyN^''\ The duplicated segment 
might be rearranged by the operations of reversal or 
translocation; aUowing for M such rearrangements 
gn^es us a probablCt/ p « L^H/N*'\ For example, the 
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locations, is the expected number of such 
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EXHIBIT T 




> A historic 
moment for 
the scientific 
endeavor. 



THE HUA4AN 
GENOME 

■ umanity has been given a great gift. With the completion of the huirian . . 
genome sequence, we have received a powerful tool for unlocking the " . 
secrets of our genetic heritage and for finding our place among the other 
participants in the adventure of life. 

This week's issue of Science contains the report of the seqiiencing of 
the human genome from a group of authors led by Craig Venter]of Celera"* • 
Genomics. The report of the sequencing of the human genome from the 
publicly funded consortium of laboratories led by Francis Collins appears 
in this week's Nature. This stunning achievement has been portrayed — 
often unfairly — ^as a competition between two 
ventures, one public and one private. That characterization detracts from 
the awesome accomplishment jomtly unveiled this week. In truth, each 
project contributed to the other. The inspired vision that launched the 
publicly funded project roughly 10 years ago reflected, and now rewards, 
the confidence of those who believe that the pursuit of large-scale funda- 
mental problems in the life sciences is in the national interest The techmcal 
innovation and drive of Craig Venter and his colleagues made it possible ' 
to celebrate this accomplishment far sooner than was believed possible. 
Thus, we can salute what has become, in the end, not a contest but a 
marriage (perhaps encouraged by shotgun) between public frmding and 
private entrepreneurship. 

There are excellent scientific reasons for applauding an outcome that 
■ has given us two winners. Two sequences are better than one; the opportunity for comparison and con- 
. vergence is invaluable. Indeed, a real-worid proof of the importance of access to both sets of data can ' 
be found in the pages of this issue of Science, in the comparative analysis by Olivier et aL (p. 1298). 

Although we have made the point before, it is worth repeating that the sequencing of the human 
genome represents, not an ending, but the beginning of a new approach to biology. As Galas.says m 
his Viewpoint (p. 1257), the knowledge that all of the genetic components of any process can%e 
identified will give extraordinary new power to scientists. Because of this brealcdirough, research 
can evolve from analyzing the effects of individual genes to a more integrated view that exarnines 
whole ensembles of genes as they interact to form a living human being. Several articles in this issue 
' highlight how this approach is already beginning to revolutionize the way we look at human disease. 
This has been a massive project, on a scale unparalleled in the history of biology, but of course 
it has built on the scientific insights of centuries of investigators. By coincidence, this landmark 
announcement falls during the week of the anniversary of the birth of Charles Darwin. Darwin*s 
message that the survival of a species can depend on its ability to evolve in the face of change is 
peculiarly pertinent to discussions that have gone on in the past year over access to the Celera data. 
(Full information regarding the agreements that were reached to make the data available can be 
found at www.sciencemag.org/feature/data/announcement/gsp.shl.) We are willing to be flexible, in ; 
allowing data repositories other than the traditional GenBank, while insisting on access to all the . 
data needed to verify conclusions. In this domain, change is everywhere: Commercial researchers 
are producing more and more potentially valuable sequences, yet (at least in the United States) 
laws governing databases provide scant protection against piracy. Had the Celera data been kept se- 
cret, it would have been a serious loss to the scientific community. We hope that our adaptability in 
the face of change will enable other proprietary data to be published after peer review, in a way that . 
satisfies our continuing commitment to friU access. ^ * , « 

It should be no surprise that an achievement so stunning, and so carefully watched, has created * 
new challenges for the scientific venture. Science is proud to have played a role in bringing this 
discovery onto the public stage. It is literally true that this is a historic moment for the scientific en- 
deavor. The human genome has been called the Book of Life. Rather, it is a library, in which, with 
rules that encourage exploration and reward creativity, we can find many of the books that will c 
help define us and our place in the great tapestry of life. 

Barbara R.Jasny and Donald Kennedy • 
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EXHIBIT U 



Query= SEQ ID NO:l 

(1278 letters) 



Sequences producing significant alignments: 

AC025750.10 .1. 151367 

>AC025750 . 10 . 1 . 151367 

Length = 151367 

Score = 1287 bits (649), Expect = 0.0 
Identities = 649/649 (100%) 
Strand = Plus / Minus 



Score E 
(bits) Value 

1287 0.0 



Query: 630 caggataattgaagc tat ctgcataggttggttcactgccgagtgcatcgtgaggt teat 689 

lililllllllllllllllllllllllllllllllllMIIIMIIIIIIIIIIIMIII 

Sbjct: 8482 caggataattgaagctatctgcataggttggttcactgccgagtgcatcgtgaggttcat 8423 



Query: 
Sbjct: 



690 



8422 



tgtctccaaaaacaagtgtgagtttgtcaagagacccctgaacatcattgatttactggc 749 

IlillllllllliiilllliilllllllllllllllllllllllllllllMIIMIMI 

tgtctccaaaaacaagtgtgagtttgtcaagagacccctgaacatcattgatttactggc 83 63 



Query : 
Sbjct: 



750 



8362 



aatcacgccgtattacatctctgtgttgatgacagtgtttacaggcgagaactctcaact 809 

llllllllllllillllllllllllMlllillllllllllllllMIIIIIIIIIIIII 

aatcacgccgtattacatctctgtgttgatgacagtgtttacaggcgagaactctcaact 8303 



Query: 810 ccagagggctggagtcaccttgagggtacttagaatgatgaggattttttgggtgattaa 869 

IIIMIIIMMIIIMMIIIIIIMIIMIIMIMIIIIIIMMMMMIMMI 

Sbjct: 8302 ccagagggctggagtcaccttgagggtacttagaatgatgaggattttttgggtgattaa 8243 
Query: 870 gcttgcccgtcacttcattggtcttcagacactcggtttgactctcaaacgttgctaccg 929 

IIIIMIMMIMMMIIIIMIMIIIIIIIIIMIIMIIMIMIMIIIIIIII 

Sbjct: 8242 gcttgcccgtcacttcattggtcttcagacactcggtttgactctcaaacgttgctaccg 8183 
Query: 930 agagatggttatgttacttgtcttcatttgtgttgccatggcaatctttagtgcactttc 989 

MIIIMIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIMIIMIIIIIIIIIIIIIII 

Sbjct: 8182 agagatggttatgttacttgtcttcatttgtgttgccatggcaatctttagtgcactttc 8123 
Query: 990 tcagcttcttgaacatgggctggacctggaaacatccaacaaggactttaccagcattcc 1049 

lllllllllillllllllllllllMIIIIIIIIIIIIIIMIIIIIIIIIIIIIillll 

Sbjct: 8122 tcagcttcttgaacatgggctggacctggaaacatccaacaaggactttaccagcattcc 8063 



Query: 1050 tgctgcctgctggtgggtgattatctctatgactacagt tggctatggagatatgtatcc 1109 

lllillilllMIIMIIIIIIIIIIIIIIIIMIIIIIMIIIIIIIIIIIIIIIIIII 

Sbjct: 8062 tgctgcctgctggtgggtgattatctctatgactacagttggctatggagatatgtatcc 8003 



Query: 1110 tatcacagtgcctggaagaattcttggaggagtttgtgttgtcagtggaattgttctatt 1169 

IMIIMIIIIIIIIIilllllllllllllllllllllllllllllllllMlllliiii 

Sbjct: 8002 tatcacagtgcctggaagaattcttggaggagtttgtgttgtcagtggaattgttctatt 7943 



Query: 1170 
Sbjct: 7942 



ggcattacctatcacttttatctaccatagctttgtgcagtgttatcatgagctcaagtt 1229 

iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii 

ggcattacctatcacttttatctaccatagctttgtgcagtgttatcatgagctcaagtt 7883 



Query: 1230 tagatctgctaggtatagtaggagcctctccactgaattcctgaattaa 1278 

IIIIIIIIIIIIMIIIIIIIIIMIIIIIIIIMIIIIIIIIIIIIII 

Sbjct: 7882 tagatctgctaggtatagtaggagcctctccactgaattcctgaattaa 7834 



Score = 1255 bits (633), Expect = 0.0 
Identities = 633/633 (100%) 
Strand = Plus / Minus 



60 



Query: 1 atgaccttcgggcgcagcggggcggcctcggtggtgctgaacgtgggcggcgcccggtat 

MllliiMIIIMIIIIIIIIIIIIIIIIIIIIIIIIIMMIIIIIIIIMIMIMI 

Sbjct: 57401 atgaccttcgggcgcagcggggcggcctcggtggtgctgaacgtgggcggcgcccggtat 57342 



Query: 61 tcgctgtcccgggagc tgctgaaggacttcccgctgcgccgcgtgagccggctgcacggc 120 

MIIMIIIIIIIIIMIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIllllllliiMi 

Sbjct:. 57341 tcgctgtcccgggagctgctgaaggact tcccgctgcgccgcgtgagccggctgcacggc 57282 



180 



Query: 121 tgccgctccgagcgcgacgtgctcgaggtgtgcgacgactacgaccgcgagcgcaacgag 

MMIIMIIIIIIIIIIIMIIIIIMIIIIIIIIIIIIIIIIIIIIIIIMIIIIIII 

Sbjct: 57281 tgccgctccgagcgcgacgtgctcgaggtgtgcgacgactacgaccgcgagcgcaacgag 57222 



Query : 
Sbjct: 



181 



57221 



tacttcttcgaccggcactcggaggccttcggcttcatcctgctctacgtgcgcggccac 240 

illlllllMIIIIIIIIIIIIIIIIIMIIMMIIIIIilllllMIIIIIIIIIIII 

tacttcttcgaccggcactcggaggccttcggcttcatcctgctctacgtgcgcggccac 57162 



Query : 
Sbjct: 



241 



57161 



ggcaagctgcgcttcgcgccgcggatgtgcgagctctccttctacaacgagatgatctac 3 00 

MIIIIIIIIIIIIMIIIIIMIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIII 

ggcaagctgcgcttcgcgccgcggatgtgcgagctctccttctacaacgagatgatctac 57102 



Query : 
Sbjct: 



301 



57101 



tggggcctggagggcgcgcacctcgagtactgctgccagcgccgcctcgacgaccgcatg 3 60 

IIMIIIMMIIIIIIIIMIIIIIMIIIIIIIIIIIIIIMIMIIIIIIIIIMII 

tggggcctggagggcgcgcacctcgagtactgctgccagcgccgcctcgacgaccgcatg 57 042 



Query : 
Sbjct: 



361 



57041 



tccgacacctacaccttctactcggccgacgagccgggcgtgctgggccgcgacgaggcg 42 0 

IMIIIIIIIIIIMIIIIIIIIIIIIIIIIIMIIIIIIIIIIMMIIIMIIIIIII 

tccgacacctacaccttctactcggccgacgagccgggcgtgctgggccgcgacgaggcg 56982 



Query: 421 cgccccggcggggccgaggcggctccctccaggcgctggctggagcgcatgcggcggacc 480 

IIIIIIIIIMIMMIIIIIIIIIIIIMIIIIIIIIMIIIMIIIIMIIMIMII 

Sbjct: 56981 cgccccggcggggccgaggcggctccctccaggcgctggctggagcgcatgcggcggacc 56922 



Query: 481 ttcgaggagcccacgtcgtcgctggccgcgcagatcctggctagcgtgtcggtggtgttc 540 

IIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIM 

Sbjct: 56921 ttcgaggagcccacgtcgtcgctggccgcgcagatcctggctagcgtgtcggtggtgttc 56862 
Query: 541 gtgatcgtgtccatggtggtgctgtgcgccagcacgttgcccgactggcgcaacgcagcc 600 

IMIIMIIIIIMIMIIIIIIIIIIMIIIIIIMIIIIIIIIIIIIIMIillMII 

Sbjct: 56861 gtgatcgtgtccatggtggtgctgtgcgccagcacgttgcccgactggcgcaacgcagcc 56802 
Query: 601 gccgacaaccgcagcctggatgaccggagcagg 633 

IIIMIIIIIIIIIIIIIIIIIIIIMIIIIIi 

Sbjct : 56801 gccgacaaccgcagcctggatgaccggagcagg 56769 
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Regulated expression of a vitellogenin fusion gene in transgenic 
nematodes. 

Spieth J, MacMorris M, Broverman S, Greenspoon S, Blumenthal T. 

Program in Molecular, Cellular, and Developmental Biology, Indiana 
University, Bloomington 47405. 

In Caenorhabditis elegans the vitellogenin genes are expressed abundantly in 
the adult hermaphrodite intestine, but are otherwise silent. In order to begin to 
understand the mechanisms by which this developmental regulation occurs, we 
used the transformation procedure developed for C. elegans by A. Fire (EMBO. 
J 1986 5 2673-2680) to obtain regulated expression of an introduced 
vitellogenin fusion gene. A plasmid with vit-2 upstream and coding sequences 
fused to coding and downstream sequences of vit-6 was injected into oocytes 
and stable transgenic strains were selected. We obtained seven independent 
strains in which the plasmid DNA is integrated at a low copy number. All 
strains synthesize substantial amounts of a novel vitellogenm-hke polypeptide 
of 155 kDa that accumulates in the intestine and pseudocoelom, but is not 
transported efficiently into oocytes. In two strains examined in detail the fusion 
gene is expressed with correct sex, tissue, and stage specificity. Thus we have 
demonstrated that the nematode transgenic system can give proper 
developmental expression of introduced genes and so can be used to identify 
DNA regulatory regions. 

PMID: 3181632 [PubMed - indexed for MEDLINE] 
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Genetic transformation of mouse embryos by microinjection of 
purified DNA 

(gene transfer/mice) 

Jon W, Gordon*, George A. Scangos^ Diane J. Plotkin*, James A. Barbosa*, and 
Frank H. Ruddle** 

•Department of Biology and *I>epartmcnt of Human Genetics, Yale University, New Haven, Conncclicul 06511 
Contributed by Frank H. Buddie, September 23, 1980 



ABSTRACT A recombinant plasmid composed of segments 
of herpes simplex vims and simian vims 40 viral DNA inserted 
into the bacterial plasmid pBR322 was microinjected into 
pronuclei of fertilized mouse oocytes. The embryos vrere im- 
planted in the oviducts of pseudopregnant females and allowed 
to develop to term. DNA from newborn mice was evaluated by 
the Southern blotting technique for the presence of DNA ho- 
mologous to the injected plasmid. Two of 78 mice in one series 
of injections showed clear homoloj^, though the injected se- 
quences had been rearranged. Band intensities from the two 
positive mice were consistent with the presence of donor DNA 
in most or all of the cells of the newborns. These results dem- 
onstrate that genes can be introduced into the mouse genome 
by direct insertion into the nuclei of early embryos. This tech- 
nique affords the opjportunity to study problems of gene regu- 
lation and cell differentiation in a mammalian system by ap- 
plication of recombinant DNA technology- 
Introduction of specified gene sequences into mammalian 
embryos can be a powerful tool for the study of developmental 
genetic problems. The fate of such genes can be monitored 
throughout development by using sensitive probing techniques 
offered by recombinant DNA technology. In addition, the 
functioning of foreign genes in a normal host environment can 
be used to study the processes of gene regulation and to study 
the physiologic roles of products of such genes more precisely. 
Introduction of foreign DNA into all cells of an intact animal 
also provides an opjportunity to pass sequences to offspring and 
to generate large numbers of transformed animab. In order to 
realize these benefits, it is necessary to transform embryos early 
in development and allow integration of foreign DNA into the 
cellular progenitors of the entire animal. 

Such experiments with mammals are difficult. Zygotes must 
be maintained in culture conditions that at least grossly ap- 
proximate the oviductal environment. Moreover, they can be 
maintained in vitro for only a few days, after which they must 
be returned to a female for implantation and further devel- 
opment. Insertion of material into early mammalian embryos 
is also difficult because of their small size. 

Investigators have recently succeeded in constructing mosaic 
mice composed in part of descendants from cultured terato- 
carcinoma cells (1-3). This advance makes possible the intro- 
duction of genes into cultured cells, which might then be in- 
duced to cooperate in the formation of an intact adult mouse 
(4, 5). These cultured cells are often aneuploid, however, and 
some difficulty has been encountered in obtaining functional 
germ celb derived from them (6). Another problem with ter- 
atoma mosaics is that they are, indeed, mosaics. Thus, teratoma 
cells of XX chromosomal constitution cannot make sperm in 

The publication costs of this article were defraved in part by page 
charge payment. This article must therefore be hereby marked *'ad- 
tyertisement" In accordance with 18 U. S. C §1734 solely to Indicate 
this fact. 



mice that develop as males; the possibility of germ-line trans- 
mission in this system is accordingly reduced. Jaenisch and 
Mintz (7) have provided evidence that whole DNA of simian 
virus 40 (SV40), when placed in cavities of mouse blastocysts, 
may be found in the resultant offspring. Ideally, however, one 
would like to introduce a small amount of well-defined genetic 
material directly into normal embryos and allow this material 
to integrate and function within the host genome. 

We have approached this problem by injecting DNA directly 
into the pronuclei of fertilized mouse oocytes. The one-cell stage 
was chosen in order to limit as much as possible the develop- 
ment of mosaicism during cleavage. To avoid the hazards of 
culture, injected embryos were immediately implanted into 
the oviducts of pseudopregnant recipients. The DNA chosen 
for injection was the bacterial plasmid pBR322 into which had 
been inserted fragments of heipes simplex and SV40 viral DNA. 
This plasmid was constructed because the SV40 fragment is 
known to contain an origin of DNA replication, whereas the 
herpes fragment codes for a gene product, thymidine kinase 
(TK), distinguishable from the endogenous mouse enzyme. 
DNA was extracted from newborn mice and screened by the 
Southern blotting technique for the presence of sequences ho- 
rnologous to the injected plasniid. Two of 78 mice evaluated 
in one experimental series were found to contain such se- 
quences. In both instances the injected DNA had been modi- 
fied, but it could be demonstrated to be derived from donor 
material. The intensity of the positive bands indicated that an 
amount of DNA roughly equivalent to one copy in every cell 
of the newborns was retained. We thus provide evidence that 
mice can be genetically transformed by direct insertion of DNA 
into early embryos. 

MATERIALS AND METHODS 

Mice. CD-I mice were obtained from the Charles River 
Breeding Laboratories. B6D2Fi mice were obtained from the 
Jackson Laboratory. All mice were maintained on a 14:10 
light-dark schedule (lights off at 10 p.m., on at 8 a.m.). Six- 
week-old females were induced to superovulate with 5 inter- 
national units of pregnant mares* serum (Gestyl, Organon) at 
4 p.m. followed 48 hr later by 2.5 international units of human 
chorionic gonadotropin (Pregnyl, Organon) and placed im- 
mediately with males for maHng. B6D2Fi females were mated 
with CD-I males; CD-I females were mated with B6D2Fji 
males. On the same evening other mature CD-I females were 
placed with vasectomized CD-I males. On the morning after 
mating (day 0) all female mice were examined for vaginal 
plugs. Six-week-old females were killed at 2 p.m. on day 0 and 



Abbreviations: SV40, simian virus 40; TK, thymidine kinase; kb, kilo- 
ba5e{s). 

♦ Present address: r>epartment of Biology, The Johns Hopkins Uni- 
versity. Baltimore, MD 21218. 
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their oviducts were removed into Krebs-Ringer bicarbonate- 
buffered medium supplemented with bovine serum albumin 
(8) and hyaluronidase at 1 mg/ml. Oviducts were opened with 
forceps and the fertiUzed eggs with remaining follicle cells were 
expressed into the disk After 1-2 min, eggs were removal and 
washed three times in 2 ml of culture medium equilibrated with 
5% CO2 in air at 37^C. Eggs containing pronuclei were iden- 
tified under the dissecting microscope and placed in lots of 20 
in a microdrop of equilibrated medium, which was placed in 
a 100-mm tissue culture dish and covered with mineral oil 
(MalUnckrodt 6358). Eggs were stored in this manner in the 
incubator until microinjected. 

Microinjection. Microneedles were pulled from thin-walled 
no. 121 IL Omega Dot tubing (Glass Co. of America) on a DKl 
model 700C pipette puller. Holding pipettes were pulled by 
hand on a microburner from G-12 capillary tubing (Thomas), 
and fire polished on a Sensaur microf orge. The tips of the nrii- 
croneedles were allowed to fill with plasmid suspension by 
capillary action and the barrels were then filled with Fluorinert 
(3M FC77)- They were then secured in PE-190 intramedic 
tubing on a Leitz micromanipulator. Holding pipettes were also 
filled with Fluorinert and similarly secured in PE-90 tubing. 
The tubing was likewise filled with Fluorinert and attached to 
l-cm^ Hamilton syringes. All manipulations were carried out 
on a Leitz microscope. 

Tissue culture dishes containing the fertilized eggs were 
placed on the microscope and eggs were positioned by holding 
the pipette such that a pronucleus near the plasma membrane 
was close to the microneedle. The microneedle was inserted into 
the pronucleus and enough plasmid suspension was injected to 
cause an approximate doubling of the pronuclear volume 
(approximately 1 pi). Eggs that survived microinjection were 
removed and stored in a 30-mm tissue culture dish containing 
2 ml of equilibrated medium until all microinjections were 
completed. Injection of 40-60 embryos required 1-2 hr. 

Implantation. Plugged pseudopregnant CD-I females were 
anesthetized with Nembutal at 6 mg/100 g of body weight. 
Ovaries were located through a dorsal incision. The ovarian 
bursa was torn away with no. 5 Dumont watchmaker s forceps, 
taking care not to rupture large blood vessels. The osUum of the 
oviduct was visualized under the dissecting microscope and a 
pipette containing 10-20 microinjected embryos was inserted 
into it. The eggs were expelled into the oviduct and the wound 
was closed with wpund clips. Mice were examined on days 
18-21 for the delivery of live offspring. Newborn mice were 
stored at SO'^C for later analysis. Sixty percent of the embryos 
survived microinjection; 30-50% of the survivors developed into 
live young. All newborns were normal in appearance. All mi- 
croinjection work was carried out under PI containment in 
accordance with National Institutes of Health guidelines. 

DNA Isolation. DN A was isolated froni whole newborn mice 
by the method of Blin and Stafford (9) with the following 
modifications. Powdered tissue was incubated for 4 hr at 50°C 
in 22 ml of 0.28 M EDTA/0.5% Sarkosyl, pH 7.0. The homog- 
enate was subsequently extracted twice in phenol/chloro- 
form/isoamyl alcohol (15 ml:5 ml:0.2 ml), and once in chloro- 
form/isoamyl alcohol (15 ml:0.6 ml). The extract was dialyzed 
for 24 hrjigainst 10 mM Tris-HCl. pH 8.0/10 mM NaCl/1 mM 
EDTA and precipitated with a 2-vol excess of 100% ethyl al- 
cohol. Precipitated DNA was stored at -20* C until use. 

Filter Hybridization. DNA was redissolved in IX TEN (10 
mM Tris-HCl. pH 7.75/10 mM NaCl/0.1 mM EDTA) to yield 
a final concentration of approximately 1 mg/ml. Twenty mi- 
crograms of DNA was digested at a 10- to 20-fold excess with 
appropriate restriction enzymes (Bethesda Research Labora- 
tories. Rockville, MD). After overnight digestion at 37**C. 



samples were eleclrophoresed in 1% agarose in 160 mM Tris- 
HCl/80 mM NaOAc/80 mM NaCl/5 mM EDTA, pH 8. at 350 
A for 22 hr. Samples were then blotted onto nitrocellulose filters 
according to the method of Southern (10). 

Nick translations were performed by using the New England 
Nuclear nick translation kit with ^P-labeled dCTP obtained 
from New England Nuclear. Filter hybridizations were per- 
formed as described by Wahl et al (1 1). Filters were then used 
to expose Kodak X-Omat x-ray film, using intensifying screens, 
until band inter\sities were appropriate for analysis. 

Construction of the Plasmid. The recornbinant plasmids, 
called pSt6, pST9 and pST12, carrying the SV40 origin of 
replication and prornoters, and the herpes simplex virus TK 
gene were constructed by inserting the SV40 Hindlll-C frag- 
ment (12, 13) into the available Htndlll site in the plasmid 
^xpTKX-1 (14). DNA from the SV40 mutant 1265, kindly pro- 
\^ded by C. Cole of Yale University, \yas digested to completion 
with restriction enzymes Hindlil and Hinil (New England 
BioLabs) simultaneously. The double digestion generated two 
fraginents larger than 550 base pairs; the f/in dlll-C fragment 
(1099 base pairs; map position 0.649-0.859) and the Hfnfl-B 
fragment (1085 base pairs; 0.992-0,199), which comigrated on 
a 1% Seaplaque agarose gel. The 1.1-kilobase (kb) doublet baiid 
was extracted from the gel and ligateid with pTKX-1 that had 
been digested with HindlM and alkaline phosphatase [as de- 
scribed by Ullrich et al (15) except that bovine alkaline phos- 
phatase (Sigma) was used]. The molar ratio of the vector to 
target in the ligation mixture was 3:1. The ligation mixture was 
incubated at 4°C for 17 hr with one addition of phage T4 ligase 
at 11 hr. The mixture was used to transform Escherichia coli 
strain HB lOl, and ampicillin-resistant colonies were selected. 
Colonies carrying the putative pST plasmids were identified 
by colony hybridization (16), using Sy40 DNA as the probe. 
Approximately 20% of the ampicillin-resistant colonies con- 
tained SV40 sequences. Confirmation of the HmdIII-C frag- 
ment insertion and determination of its orientation in the 
plasmid was done by restriction analysis of mini DNA isolations 
(17). A restriction endonuclease map of the plasmid pST6 is 
shown in Fig. 1. This work was carried out under P2 contain- 
ment in accordance with National Institutes of Health guide- 
lines. 




Fig. 1. The circular plasmid pST6, a derivative of pBR322. 
Hatched area shows the SV40 insert; stippled area denotes the herpes 
simplex virus TK insert amp', ampiciUin resistance gene; on, origin 
of DNA replication. 
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Table 1. Summary of microinjection data 







Copies 


Jriasmia 






injected 


TV XT A 


Exp. 


Plasmid 


per cell 


Offspring positives 


1 


pSTS 


1.000 


78 2 


2 


pST6 


12,000 


10 0 


3 


pST6 (linearized) 


1,000 


40 0 


4 


pST9 


1,000 


16 0 


5 


pRH 1.3Mm 1 


1,000 


27 0 


6 


pST12 


600 


2 0 


7 


Uninfected control 




54 0 



The pRH 1.3Mm 1 plasmid consists of a cloned fragment of a 
member of the highly repeated and interspersed EcoKl-Bgl H se- 
quence family cloned in pBR322, provided by N. Arnheim (18). pST9 
is identical to pST8, except that the orientation of the SV40 irBgment « 
is reversed. pST12 is a dimer of pST6. pST was linearized by Sal I A 
digestion. A total of 187 mice were bom from microinjected embryoe. 

RESULTS 

Results of the plasmid microinjections are summarized in Table 
1. In the first experimental series, injection of several hundred 
embryos yielded 78 live young. DNA was extracted from whole 
newborn mice for rapid and efficient determination of trans- 
formation frequency. The screening method gives a low esti- 
mate of the number of trarisformants; embryos with trans- 
forming DNA in a small percentage of their cells could have 
escaped detection. DNA from 2 of these 78 newbom mice 
contained sequences that hybridized strongly with the probe. 
pST6. The restriction ernlonuclease patterns of the incorporated 
sequences were significantly different between the two off- 
spring, and are described below. 

DNA from the first positive animal, no. 48, gave two intense 
bands with estimated sizes of 12.9 kb and 9.8 kb and a third 
band of very large size (>24 kb) when digested with BomHI 
(Fig. 2). The positions of the two smaller bands were unaffected 
by digestion with Hindlll, EcoRI, BarhUh or Xba I (Fig. 2). 
This result suggested that the TK sequences, which had been 
inserted into the Bam HI sites, and the SV40 sequences inserted 
into the Hin dlll sites were not present in their native state in 
the incorporated material. The Hindlll digestion, however, was 
incomplete as judged from the control track. We therefore 
probed witli SV40 DNA alone. No sequences homologous to this 
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FIG. 3. DNA from mouse no. 48 digested with ^tndlQ or /^un* 
or undigested; probed with pST6. Fragment sizes are indicated in 
kb. 

probe were detected. The 12.9- and 9.8-kb fragments appeared 
in the undigested sample, consistent with their presence as free 
molecules. Digestion of the DNA with Pvu II generated two 
bands of altered mobility, 2,8 kb and 6.8 kb in size (Fig. 3). This 
result indicated that the sequeiK^s represented by the 129- and 
9.8-kb bands contained at least one Pvu H site. We believe these 
results, taken together, are consistent with the existence of free 
circular molecules in the DNA of mouse no. 48. 

The second positive, no. 73, showed a markedly different 
blotting patteriL In the undigested DNA, hybridizable material 
was not separable from the high molecular weight mouse DNA. 
Moreover, digestion with Xba I, which does not. cut p>ST6, gave 
a single band of greater size than the highest molecular weight 
standard of 23.7 kb. Finally, several bands showed horriology 
with probes synthesized from either purified SV40 DNA or TK 
fragment (Fig. 4). Thus, this animal had retained all or part of 
these portions of the plasmid. 

Digestion with Bam HI yielded three major bands, 7.8 kb, 
3.9 kb, and 3.4 kb. The largest band showed homology with 




I Xba 1 I //mdlll I EcoXil | BamHl \ 

Fig. 2. DNA from mouse no. 48 digested with BamHI, EcoRI, Hmdlll, and Xba 1. The labeled probe was pST6 DNA. NC indicates the 
negative control (DNA isolated from uninjected mice). Positive controls include {i) NC DNA with SV40 DNA added (SV40) and (it) NC DNA 
with the plasmid pTTX-1 added (pTK). Arrow indicates the high molecular weight band that appears reproducibly in BamHl digests. 
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Fig. 4. DNA from mouse no. 73 digested with BamHI and probed 
with pST6 (Ce/iier), SV40 DNA (Right), or TK fragment {Left), 
Positive control (PC) consists of pST6 added to mouse DNA to a 
concentration of 10"^ by weight. NC denotes the negative control 
(DNA isolated from uninjected mice). 

SV40, SV40 + pBR322, and with the whole plasmid. pST6 (Fig. 
4). The two smaller bands showed homology with TK fragment, 
but not with SV40 or pBR322 (Fig. 4). The probeable portions 
of these smaller pieces were thus composed entirely of TK- 
derived material. The smallest band, 3.4 kb, closely approxi- 
mates the size of the TK fragment that was inserted into the 
BamHI sites, suggesting that the entire TK gene had been re- 
tained in no. 73. Digestion with Bgl I, however, proved this 
supposition incorrect. An internal fragment of TK approxi- 
mately 1.8 kb in size, defined by Bgl I sites, did not appear in 
the DNA (data not shown). This showed that the 3.4- and 3.&-kb 
BamHI fragments were composed of portions of the TK frag- 
ment that had either been concatamerized or complexed with 
mouse DNA to yield molecular weights equal to or greater than 
the molecular weight of the original TK insert. 

Digests with Pou II and H^n dlll provided strong evidence 
that the entire SV40 sequence was retained. Digestion with 
//in dm produced a fragment very close in size to the SV40 
insert of pST6. In addition, digestion with Pvu II gave two 
fragments that migrated indistinguishably from the Pvu II- 
defined SV40 fragments of pST6. Thus, two independent ex- 
periments support the contention that the entire SV40 fragment 
was present. 

DISCUSSION 

These data demonstrate that it is possible to use a recombinant 
plasmid as a vector for transfer of foreign genes directly into 
mouse embryos, and that these embryos can maintain the for- 
eign genes throughout development. Moreover, the intensity 
of the bands on Southern blot analysis suggests that most or all 
of the cells of the newborns contained derivatives of the injected 
plasmid. Blotting experiments with hybrid cell populations have 
shown that sequences cannot be detected if present in fewer 
than 10% of the cells (19). We are thus confident that the two 
transformed mice contained enough plasmid DNA for distri- 
bution of one copy to at least this percentage of their cells. Our 
positive controls were adjusted to correspond to one copy of 
pST6 per diploid genome. The band intensities of no. 48 and 
no. 73 are comparable to the control. Thus, the transforming 
sequences are probably present in far greater amounts than the 
10% threshold of detectabilily; the band intensities are more 
consistent with the presence of the plasmid derivative in most 
or all of the cells of the newborns. Our method of analysis 
cannot rule out the possibility that only a few of the celb con- 
tained ail of Ae sequences while most of the cells were negative, 
but we consider unlikely the chances that celb carrying a large 



amount of additional genetic material would survive and 
compete successfully through development. If the transforming 
sequences were in fact distributed throughout the tissues of the 
mice, then integration must have occurred at an early stage, 
shortly after determination of the inner cell mass. Injection of 
one-celled embryos may be important for obtaining early in- 
tegration. In addition, the high mortality caused by microin- 
jection suggests that injection of only a fraction of the cells of 
a later cleavage stage might result in preferential survival of 
uninjected blastomeres and consequently give a lower rate of 
success. 

The transformation rate reported here compares very fa- 
vorably with other gene transfer systems involving mammalian 
cells. Calcium phosphate-mediated gene transfer into cultured 
cells results in transformation rates of 10"^ to 10"^ (20, 21), 
.While microinjection of cultured cells gives approximately 5% 
Success (22). Our transformation rate agrees well with these 
latter results. The reasons for higher rates in microinjection 
experiments are unknown but may include the facts that DNA 
is inserted directly into the nucleus and that gene expression 
is not required in the mouse system. 

Significant differences were found between the two trans- 
formed mice. In mouse no. 48, SV40 and herpes viraJ TK DNA 
could not be detected. The remaining sequences, derived from 
pBR322, were complexed into three bands, all of higher mo- 
lecular weight than the entire pBR322 plasmid. In addition, two 
of these bands represented DNA that probably existed free of 
the host genome. The presence of unintegrated sequences in 
no. 48 is intriguing. Two plausible models can be invoked to 
explain this observation: (i) these sequences may have replicated 
autonomously and persisted as plasmid-like units; (ii) alterna- 
tively, they may have been generated from an integrated seg- 
ment. The former model requires that the free sequences have 
the capacity to replicate. The plasmid from which they de- 
scended did contain the pBR322 and SV40 origins. But, inter- 
estingly. SV40 DNA is undetectable in the reitained material. 
It is also possible that a mouse origin was acquired as a result 
of interaction with the host genome. 

It is more likely that the free sequences were generated from 
integrated material. Generation of free circular DNA from 
trarisformed cultured cells has been observed previously (23). 
Cells infected with viruses can also generate free DNA from 
the integrated viral genome (24). In addition, cells transformed 
in calcium phosphate-mediated gene transfer experiments can 
pass through an unstable phase during which the donated 
material is maintained independent of the host genome as high 
molecular weight **transgenomes" (25). An important charac- 
teristic of these independent transgenomes is their rapid loss 
from recipient cells; as many as 1055 of the cells may lose the 
transforming sequences per day (25). The rearrangement of 
the donor material in no. 48 appears analogous to transgenome 
formation in cultured cells. If the unintegrated sequences were 
similar to independent transgenomes, we would expect them 
to be rapidly lost from the mouse cells during development and 
not detectable in the newborn. The marked intensity of the two 
bands in no. 48 rather suggests that they were continuously 
being produced from an integrated sequence. The presence of 
a high molecular weight band after digestion with BomHI is 
also consistent with the integration model. This band may 
represent material from which the two smaller bands were 
generated. 

In mouse no. 73. no free sequences were present. Both the 
undigested and Xba I-digested samples gave single bands of 
greater size than the highest molecular weight standard. 
Moreover, S V40 and TK sequences were retained in this animal. 
The patterns of bands present in mouse no. 73 is explained best 
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by pla^mid integration into the mouse genome at a site within 
the TK region. In this model, digestion of the mouse DNA with 
BamHl would generate three plasmid-derived fragments, two 
of which would consist of the TK fragment (now at both ends 
of the integrated molecule) linked to mouse DNA. The third 
fragment would be cleaved from within the integrated plasmid 
and would contain the SV40 and pBR322 moieties. The pre- 
dicted size of this internal fragment is 5.5 kb. This model also 
predicts that the TK fragment would be disrupted and that the 
SV40 and pBR322 sequences would be intact. The DNA of 
mouse no. 73 contained two bands of 3.4 and 3.9 kb that hy- 
bridized only with the purified TK fragment and contained no 
sequences homologous to SV40 or pBR322, and a band of 7.8 
kb that hybridized to SV40 and not to TK. The large size of this 
fragment relative to the expected 5.5-kb fragment might be due 
to partial internal duplication, which is consistent with inde-», 
pendent observations of SV40 integration (26, 27). Digestion 
of the DNA of mouse no. 73 with Bgl I or with Pvu II failed to 
generate expected fragments from within the TK insert but 
indicated that most or all of pBR322 and SV40 were present. 
Additionally, Hindlll digestion generated a band of the ex- 
pected size of the SV40 insert, indicating that all of the SV40 
sequences present on pST6 were also present in the DNA of 
mouse no. 73 (data not shown). Thus, our observations are 
consistent with a single integration event. 

An important similarity between the two positive mice was 
the extensive rearrangement of the sequences. In the first in- 
stance. SV40 and herpes virus TK sequences were largely if not 
entirely removed from the injected DNA. In the second case, 
SV40 sequences and herpes virus TK sequences were demon- 
strable, but the TK gene was significantly rearranged. These 
observations raise the possibility that selection occurred against 
embryos that retained the TK gene intact and in an active state. 
The possibility that herpes virus TK is teratogenic to mouse 
embryos is consistent with our data. We consider this notion 
unlikely, however, because cells transformed in culture and 
under selection for TK demonstrate similar patterns of rear- 
rangement (25. 28). 

These initial results show that genetic transformation can be 
extended to whole mammalian organisms at a very early stage 
in their development. Further refinement of these techniques 
should lead to a reliable system of embryo transformation with 
its attendant applications for investigation of problems in de- 
velopment and cell differentiation. 

Nolc Added in Proof. We have produced a third Iransformant by 
injection of 30,000 copies per cell of the plasmid pST9. Rcslriclion 
analysis indicates thai, as in mouse no. 73. the transforming sequences 
are integrated. Initial studies also indicate that at least one complete 
copy each of both the herpes virus TK and SV40 regions has been re- 
tained in this animal. 
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Fulminant hypertension in transgenic rats harbouring the mouse 
Ren-2 gene. 

Mullins JJ, Peters J, Ganten D. 

German Institute for High Blood Pressure Research, University of Heidelberg. 

PRIMARY hypertension is a polygenic condition in which blood pressure is 
enigmatically elevated; it remains a leading cause of cardiovascular disease and 
death due to cerebral haemorrhage, cardiac failure and kidney disease. The 
genes for several of the proteins involved in blood pressure homeostasis have 
been cloned and characterized, including those of the renin-angiotensin system, 
which plays a central part in blood pressure control. Here we describe the 
introduction of the mouse Ren-2 renin gene into the genome of the rat and 
demonstrate that expression of this gene causes severe hypertension. These 
transgenic animals represent a model for hypertension in which the genetic basis 
for the disease is known. Further, as the transgenic animals do not overexpress 
active renin in the kidney and have low levels of active renin in their plasma, 
they also provide a new model for low-renin hypertension. 
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Production of transgenic rabbits, sheep and pigs by 
microinjection. 

Hammer RE, Pursel VG, Rexroad CE Jr, Wall RJ, Bolt DJ, Ebert KM, 
Palmiter RD, Brinster RL. 

Direct microinjection has been used to introduce foreign DNA into a number of 
terminally differentiated cell types as well as embryos of several species 
including sea urchin, Candida elegans, Xenopus, Drosophila and mice. Various 
genes have been successfully introduced into mice including constructs 
consisting of the mouse metallothionein-I (MT) promoter/regulator region fused 
to either the rat or human growth hormone (hGH) structural genes. Transgenic 
mice harbouring such genes commonly exhibit high, metal-inducible levels of 
the fusion messenger RNA in several organs, substantial quantities of the 
foreign growth hormone in serum and enhanced growth. In addition, the gene is 
stably incorporated into the germ line, making the phenotype heritable. Because 
of the scientific importance and potential economic value of transgenic 
livestock containing foreign genes, we initiated studies on large animals by 
microinjecting the fusion gene, MT-hGH, into the pronuclei or nuclei of eggs 
from superovulated rabbits, sheep and pigs. We report here integration of the 
gene in all three species and expression of the gene in transgenic rabbits and 
pigs. 
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Rabbit p -Casein Promoter Directs Secretion of 
Human Interleukin-2 into the Milk of 
Transgenic Rabbits 

Th. A. BuhlerS Th. Bruyere^, D. F. Went\ G. Stranzinger^ & K. 
Burki^ 
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To test the potential usefulness of transgenic rabbits as 
production systems for human proteins of pharmaceutical 
value, we cloned the rabbit p-casein promoter and fused it 
to the genomic sequence of the human interIeukin-2 (hIL2) 
gene. Four transgenic female rabbits were tested for 
expression and biological activity of the foreign protein in 
their milk. The milk of all four females proved to contain 
biologically active hIL2. The results show that transgenic 
rabbits may represent a convenient and economic system 
for the rapid production of biologically active protein in 
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Effect of transgenic GDNF expression on gentamicin-induced 
cochlear and vestibular toxicity. 

Suzuki M, Yagi M, Brown JN, Miller AL, Miller JM, Raphael Y. 

Kresge Hearing Research Institute, The University of Michigan, Ann Arbor 
48109-0648, USA. 

Gentamicin administration often results in cochlear and/or vestibular hair cell 
loss and hearing and balance impairment. It has been demonstrated that 
adenovirus-mediated overexpression of glial cell line-derived neurotrophic 
factor (GDNF) can protect cochlear hair cells against ototoxic injury. In this 
study, we evaluated the protective effects of adenovirus-mediated 
overexpression of GDNF against gentamicin ototoxicity. An adenovirus vector 
expressing the human GDNF gene (Ad.GDNF) was administered into the scala 
vestibuli as a rescue agent at the same time as gentamicin, or as a protective 
agent, 7 days before gentamicin administration. Animals in the Rescue group 
displayed hearing thresholds that were significantly better than those measured 
in the Gentamicin or Ad.LacZ/Gentamicin groups. In the Protection group, 
Ad.GDNF afforded significant preservation of utricular hair cells. The data 
demonstrated protection of the inner ear structure, and rescue of the inner ear 
structure and function against ototoxic insults. These experiments suggest that 
inner ear gene therapy may be developed as a clinical tool for protecting the ear 
against environmentally induced insults. 

PMID: 10871754 [PubMed - inde.xed for MEDLINE] 
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Spiral Ganglion Neurons Are Protected from Degeneration by GDNF Gene 
Therapy 

Masao Yagi, Sho Kanzaki, Kohei Kawamoto, Brian Shin, Pratik P. Shah, Ella Magal. 
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Abstract: 

Perceptual benefits from the cochlear prosthesis are related to the quantity and 
quality of the patient's auditory nerve population. Multiple neurotrophic factors, such 
as glial cell line-derived neurotrophic factor (GDNF), have been shown to have 
important roles in the survival of inner ear auditory neurons, including protection of 
deafferented spiral ganglion cells (SGCs). In this study, GDNF gene therapy was 
tested for its ability to enhance survival of SGCs after aminoglycoside/diuretic- 
induced insult that eliminated the inner hair cells. The GDNF transgene was 
delivered by adenoviral vectors. Similar vectors with a reporter gene (lacZ) insert 
served as controls. Four or seven days after bilateral deafening, 5 ml of an 
adenoviral suspension (Ad-GDNF or Ad-lacZ) or an artificial perilymph v/as injected 
into the left scala tympani of guinea pigs. Animals were sacrificed 28 days after 
deafening and their inner ears prepared for SGC counts. Adenoviral-mediated GDNF 
transgene expression enhanced SGC survival in the left (viral-treated) deafened 
ears. This observation suggests that GDNF is one of the survival factors in the inner 
ear and may help maintain the auditory neurons after insult. Application of GDNF 
and other sun/ival factors via gene therapy has great potential for inducing survival of 
auditory neurons following hair cell loss. 
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Introduction of the human growth hormone gene into the guinea 
pig mammary gland by in vivo transfection promotes sustained 
expression of human growth hormone in the milk throughout 
lactation. 

Hens JR, Amstutz MD, Schanbacher FJL, Mather IH. 

Department of Animal and Avian Sciences, University of Maryland, College 
Park 20742, USA. 

We tested the feasibility of transfecting mammary tissue in vivo with an 
expression plasmid encoding the human growth hormone (hGH) gene, under the 
control of the cytomegalovirus promoter. Guinea pig mammary glands were 
transfected with plasmid DNA infused through the nipple canal and expression 
was monitored in control and transfected glands by radioimmunoassay of milk • 
samples for hGH. Sustained expression of hGH throughout lactation was 
attained with a polyion transfection complex shown to be optimal for the 
transfection of bovine mammary cells, in vitro. However, contrary to 
expectations, hGH expression was consistently 5- to 10-fold higher when 
DEAE-dextran was used alone for transfection. Thus polyion complexes which 
are optimal for the transfection of cells in vitro may not be optimal in vivo. The 
highest concentrations of hGH in milk were obtained when glands were 
transfected within 3 days before parturition. This method may have application 
for studying the biological role or physical properties of recombinant proteins 
expressed in low quantities, or for investigating the regulation of gene 
promoters without the need to construct viral vectors or produce transgenic 
animals. 
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High-level synthesis of a heterologous milk protein in the mammary 
glands of transgenic swine 
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ABSTRACT The whey acidic protein (WAP) is a major 
milk protein in mice, rats, anil rabbits but has not been found 
in miiK of livestock including swine. To determine whether 
mammary gland regulatory elements from the WAP gene 
function across species boundaries and whether it is possible to 
qualitatively alter milk protein composition, we introduced the 
mouse WAP gene into the genome of swine. Three lines of 
transgenic swine were analyzed, and mouse WAP was detected 
in milk from all lactating females at concentrations pf about 1 
g/Jiter; these levels are similar to those foiind in mouse milk. 
Expression of the corresponding RNA was specific to the 
mammary gland. Our results suggest that the molecular basis 
of mammary-specific gene expression is conserved between 
swine and mouse. In addition the WAP gene must share, with 
other milk protein genes, elements that target gene expression 
to the mammary gland. Mouse WAP accounted for about 3% 
of the total milk proteins in transgenic pigs, thus demonstrating 
that it is possible to produce high levels of a foreign protein in 
milk of farm animals. 



Milk protein genes are transcribed in the mammary gland of 
laclating animals, and the encoded proteins are secreted in 
large quantities into milk. The whey acidic protein (WAP) is 
an abundant milk protein in mice (1, 2) but has not been found 
in swine or other livestock. Expression of the WAP gene is 
confined to the mammary gland (2, 3) and is under the control 
of steroid arid peptide hormones as well as other develop- 
mental signals during pregnancy (4-6). 

By targeting synthesis of foreign proteins to the mammary 
gland of transgenic animals, it should be possible to produce 
valuable proteins on a large scale in milk (7, 8). The combined 
properties of high activity and tissue-specificity make the 
murine WAP gene promoter a good candidate for targeting 
gene expression to the mammary gland. Towards this end we 
previously have expressed a hybrid gene containing regula- 
tory elements from the mouse WAP gene and coding se- 
quences from human tissue plasminogen activator in the 
mammary gland of transgenic mice (5, 6) and analyzed the 
protein in milk (5, 9). By characterizing the WAP gene, it may 
be possible to use its control elements to target expression of 
hybrid genes in farm animals. However, it is not known 
whether mammary regulatory elements are gene specific and 
whether they are functional across species boundaries. In 
addition, it is not known if the presence of a novel protein 
may adversely affect the physiology of the mammary gland. 
To address these questions we introduced the unmodified 
mouse WAP gene (10) into swine, which themselves do not 
contain a WAP gene, and analyzed expression of RNA and 
protein. With this approach, potential problems in interpret- 
ing expression data from hybrid genes would not be a factor. 
Also, potential deleterious physiological effects of a foreign 
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\protein mi^ht be minimized because the target gene encodes 
V a milk protein that would be confined primarily to the 
mammary gland. 

Swine were chosen for these studies because they offer 
both economy in animal resources and time when compared 
to ruminantia as a transgenic animal model and because the 
questions being addressed did not require harvesting large 
quantities of milk that would be more easily obtained from 
dairy animals such as cows, goats, or sheep. The two primary 
constraints in any large animal transgenic project are the 
number of fertilized ova obtainable and the number of 
embryo recipients available. On average it is possible to 
recover 2-3 times more ir\jectable ova per donor gilt than can 
be collected from a cow, doe, or ewe. The efficiency of 
producing expressing transgenic pigs or sheep per injected 
ovum is about 0.3% (calculated from refs. 11 and 12). Though 
a live-bom-expressing transgenic calf has not been reported, 
a larger number of ova will probably be required to produce 
an expressing transgenic cow (13). Furthermore, because 
swine are polytocous, a recipient sow can carry 5 times as 
many fetuses as a cow, doe. or ewe. Additionally, the 
generation interval of swine is =11 months, whereas that of 
goats is between 11 and 21 months and that of cattle at least 
24 months. Considering all of these factors, the use of swine 
rather than cows, goats, or sheep requires one-sixth the 
number of animals, with results obtainable in less than half 
the time. 

MATERIALS AND METHODS 

Production of Transgenic Pigs. Ovulation control and egg 
recovery were performed as described (14). Briefly, the lirne 
of ovulation of sexually mature gilts was controlled by 
feeding 15 mg of Altrenogest (R-2267, 17-allyl-hydroxyestra- 
4.9,ll-trien-3-onc, Roussel-Uclaf) daily for 5-9 days, begin- 
ning on day 12 and ending on day 15 of the estrous cycle. 
Twenty-four hours after the last feeding of Altrenogest, each 
gilt was given 1000 to 2000 international units of pregnant 
mare*s serum gonadotropin (PMSG) by subcutaneous injec- 
tion, and 79 hr later each gilt was given an intramuscular 
injection of 500 international units of human chorionic gona- 
dotropin (hCG). Estrus behavior was monitored, and embryo 
donor gilts were either bred with a fertile boar or were 
artificially inseminated with fresh semen twice during estrus. 

Approximately 58-61 hr after the hCG injection (18-21 
hours after the expected time of ovulation), the reproductive 
tracts of donor gilts were exposed by midventral laparotomy 
during general anesthesia. Ova were recovered by flushing 20 
ml of Dulbecco's phosphate-buffered saline (15) from the 
uterotubal junction through the cannulated infundibular end 



Abbreviation: WAP, whey acidic protein. 

^Present address: Hocchst AG, Frankfurt, Federal Republic of 
Germany. 

5To whom reprint requests should be addressed. 



1696 



Agricultural Sciences: Wall et al. 



Proc, Natl. Acad, ScL USA 88 (I99J) 



1697 



of each oviduct. Recovered ova were immediately trans- 
ferred into BMOC buffer (16) prior to microinjection and 
maintained at 38*^0. 

Pig ova are optically opaque and, as a consequence, their 
nuclear structures are not visible. However, centrifuging ova 
at —15,000 X ^ for 3-8 min displaces the opaque material in 
the cytoplasm, thereby allowing the nuclear structures to be 
visualized (14). Pig ova were centrifuged, and a pronucleus of 
one-celled ova or both nuclei of two-celled ova were injected 
with a TE solution (1 mM Tris-HCl/0.1 mM EDTA, pH 7,2) 
containing ng of a 7.2-ki!obase (kb) ^coKl fragment per 
fi\ that contained the mouse WAP gene (10). The fragment 
contained the entire transcribed region with its four exons, 
three introns, and 2.6-kb 5' ^nd 1.6-kb 3' flanking sequences. 
Microinjections were performed with the aide of differential 
interference contrast optics at 200-fold magnification, essen- 
tially as described for mouse ova (17). 

Between 20 and 30 injected ova were deposited into the 
ampullar region of one oviduct of each recipient gilt whose 
reproduction cycle had been synchronized with Altrenogest 
(but not superovulated — i.e., not given PMSG) or whose 
estrous cycle naturally coincided with the desired stage. 
Some recipients also received 2-4 uninjected control ova to 
increase the likelihood of maintaining pregnancy in the event 
that a majority of the microinjected eggs failed to develop. 
Time between microinjection and embryo transfer was about 
30 min. 

To identify transgenic piglets. DNA from tail biopsies was 
prepared and analyzed for the mouse WAP gene by Southern 
blotting- Offspring in the Fi generation were analyzed by the 
polymerase chain reaction by using primers specific to the 
WAP gene. 

Analysis of Mouse WAP. Milk whey proteins were sepa- 
rated under denaturing conditions in sodium dodecyl sulfate 
(SDS)/16% polyacrylamidc gels and either stained with 
Coomassie Blue or transferred to nitrocellulose filters. After 
transfer the membrane was incubated overnight in TBS (20 
mM Tris-HCI, pH 7.5/500 mM NaCI) containing 3% gelatin 
and then was washed in TTBS (TBS containing 0.05%Tween 
20). The membrane was then probed for 90 min with a 1:200 
dilution of rabbit anti-WAP serum, followed by washing and 
incubation with alkaline phosphatasc-conjugated goat anti- 
rabbit'lgG in TBS containing 1% bovine serum albumin for 1 
hr. The antibody-antigen complexes were stained with ni- 
trobluetctrazolium and 5-bromo-4-chloro-3-indoIyl phos- 
phate in 100 mM TrisrHCl, pH 9.5/100 mM NaCI/5 mM 
MgCli. 

Isolation of RNA and Northern Blot Analysis. During 
necropsy, tissues were immediately placed in liquid nitrogen 
and stored at -SO°C, and total RNA was isolated (18). RNA 
samples containing 1 of ethidium bromide solution (1 
mg/ml) were electrophoresed in 1.5% agarose/formaldehyde 
gels. The gels were blotted onto GcneScreenPlus nylon 
membranes, which were then probed with a randomly primed 
labeled 450-baserpair (bp) cDNA fragment that spanned the 
mouse WAP coding region. 

RESULTS 

The Mouse WAP Gene in Transgenic Swine. Eight-hundred 
and fifty ova were recovered and microinjected, of which 
two-thirds were at the one-cell stage of development. The 
injected DNA contained 7.2 kb of the mouse WAP gene (sec 
Materials and Methods, ref. 10). The microiiviected ova along 
with 34 control ova were transferred into 29 recipient gilts. 
Twenty-two of the recipients carried their pregnancies to 
term, resulting in the birth of 189 pigs. DNA analysis of tail 
biopsies revealed that 5 (2 males and 3 females) of the piglets 
had incorporated the mouse WAP gene into their genomes. 
Approximately 1% of the injected ova resulted in transgenic 



founders. From other transgenic pig projects using different 
. gene constructs, the efficiency of producing founder pigs was 
similar (11). In this study one pig was stillborn and one died 
shortly after birth. Such deaths are not uncommon in the pig 
industry, where neonate mortality is in the range of 15-20%. 
Lines from the three surviving pigs were established, and 
offspring were analyzed. Male founder 1301 was bred to three 
nontransgenic females; 4 of 32 offspring were transgenic, 
suggesting that he was mosaic for the WAP gene. Transgenic 
mouse breeding studies have estimated that about 30% of 
transgenic founders are germ-line mosaics (19). Based on 
Southern blot analyses, this line contains *«10 intact copies of 
the WAP gene in a head-to-tail arrangement at a single locus. 
Female founder 2202 carried «15 copies of the WAP gene. 
\She was bred at 8 months of age; 4 of 9 offspring were 
\{ransgenic. She was bred a second time and died of an 
unknown cause 4 days before anticipated parturition. The 
two transgenic daughters from her first litter were also bred, 
and after farrowing, milk and RNA were analyzed. Feniale 
founder 1302, carrying «10 copies of the WAP gene, was 
unsuccessfully bred three times. After the third failure, she 
was superovulated as a means of diagnosing the cause of her 
reproductive failures and to collect eggs if the cause did not 
involve ovarian disfunction. Twenty-eight ova were recov- 
ered and transferred to two recipients. From these, 20 piglets 
were bom of which 8 were transgenic. Apparently not all of 
female founder 1302*s eggs had been recovered because she 
subsequently gave birth to 9 piglets, 5 of which were trans- 
genic. 

Secretion of Mouse WAP into Pig Milk. Expression of the 
WAP transgene in transgenic pigs was evaluated by both 
protein and RNA analyses. Milk from female founder 2202 
and her daughter 5403, from two daughters (5511 and 5701) of 
male founder 1301, and from female founder 1302, was 
analyzed for the presence of mouse WAP. Milk proteins were 
separated in SDS/polyacrylamide gels and either stained 
with Coomassie blue or blotted onto nitrocellulose mem- 
branes and analyzed with anti-mouse WATP antibodies. WAP 
has a molecular mass of about 14 kDa (Fig. lA, lane 8) and, 
at a concentration of about 2 mg per ml, constitutes the major 




Fig. 1. Secretion of mouse WAP into milk of transgenic pigs. 
Milk proteins (20 ^g) were separated in SDS/polyacrylamide gels 
and cither stained {A) or analyzed with rabbit anti-WAP antibodies 
(fi). Lanes: 1, molecular mass markers (14. 18, 29, 45, 68, and 96 
kDa); 2, total mouse whey proteins; 3-7, milk from nontransgenic pig 
(lane 3). pig 2202 (lane 4). pig 5403 (lane 5). pig 5701 (lane 6), and pig 
5511 (lane 7); 8, 1 /ig of purified mouse WAP. 



1698 Agricultural Sciences: Wall et al 



Proc. NatL Acad. ScL USA 88 (1991) 



^ whey protein in mice (Fig. M. lane 2). A protein comigrating 
with mouse WAP was present in the milk of transgenic pigs 
(Fig. lA, lanes 4-7) but not in milk from a nontransgenic 
control pig (Fig. M, lane 3). In addition, a 14-kDa protein in 
milk from transgenic, but not from nontransgenic, pigs re- 
acted strongly with anti-mouse WAP antibodies (Fig. LB). 
The lower molecular mass material reacting with anti-WAP 
' antibodies probably reflects degradation products of the 
WAP. Taken together, this shows that the mouse WAP gene 
was- expressed in transgenic pigs, and the encoded protein 
was secreted into the milk. The level T>f mouse WAP in the 
milk of each transgenic pig was determined in ELISA. By 
setting the level of WAP in mouse milk arbitrarily at 100%, 
animals 2202 and 5403 (line 2202) and animals 5701 and 5711 
Oine 1301) were shown to express WAP at about 100%, and \ 
female foiinder 1302, at about 50%. Thus, about 1-2 g of WAP \/ 
was present per liter of pig milk. 

WAP is secreted into mouse milk during the entire lacta- 
tional period. To determine whether the expression in trans- 
genic pigs paralleled this pattern, we analyzed WAP levels in 
the milk of founder female 1302 over a 4-week lactational 
period (Fig. 2). Whey samples were separated in SDS/ 
polyacrylamide gels and either stained (Fig. 2A) or analyzed 
with anti-WAP antibodies (Fig. 2B). Constant ley els of WAP 
were found over a 26-day period. This suggests that, at least 
over this period of time, the WAP transgene was coordinately 
regulated with other pig milk protein genes. 

Expression of Mouse WAP RNA in Pigs. To correlate the 
level of WAP in milk with the corresponding RNA in mam- 
mary tissue, founder female 2202 was biopsied 11 days 
postpartum . and mammary RNA was analyzed with a mouse- 
specific WAP cDNA. An RNA of about 600 nucleotides 
hybridized with the WAP probe (Fig. 3. lanes b and c), 
confirming jiiouse WAP gene expression in the manrimary 
glands of transgenic pigs. Furthermore, the RNA levels in pig 
2202 and mouse were similar; this agrees with the WAP levels 
found in the milk. The WAP RNA in pig 2202 appeared to be 
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DAYS OF LACTATION 

Fig. 2. Expression of mouse WAP during the lactational period 
of pig 1302. Milk samples were collected at various days after 
parturition as indicated, and whey fractions were prepared. Upon gel 
separation, samples were cither stained {A) or analyzed with anti- 
WAP antibodies {B). 
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Fig. 3. Expression of mouse WAP RNA in transgenic pigs. 
Mammary RNAs (5 /ig) from a lactating nontransgenic pig Oane a), 
founder pig 2202 (lane b), and a mouse (lane c) were separated in a 
formaldehyde gel, transferred to a nylon membrane, and analyzed 
with a cloned cDNA probe specific for moiise WAP RNA. 

about 10-20 nucleotides shorter than its counterpart in mice 
(Fig. 3). Since the protein coding region was intact, the 
smaller size may be due to differences in polyadenylylation. 
RNA from a nontransgenic pig did not hybridize with the 
WAP probe (Fig. 3, lane a), verifying the absence of ah 
endogenous WAP RNA in the pig mammary gland. 

In lactating mice the WAP gene is expressed almost 
exclusively in the mammary gland with levels in nonmam- 
mary tissues at least 4 orders of magnitude lower (5). To test 
whether the 7.2-kb WAP transgene contained elements for 
stringent tissue specificity observed in mice, we analyzed 
tissues from lactating pigs from lines 2202 and 1301 for the 
presence of WAP RNA (Fig. 4). To demonstrate potential 
WAP expression in nonmammary tissues, we exposed the 
RNA blot for 24 hr (Fig. 4 a and c). The specificity of WAP 
hybridization and the quantity of WAP RNA in the mammary 
gland were assessed in a 30-min exposure (Fig. Ab). In animal 
5701 (line 1301), WAP RNA was only found in the mammary 
gland (Fig. 4c) at a level similar to that seen in a 10-day 
lactating mouse. The sensitivity of the assay would. have 
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Fig. 4. Tissue distribution of WAP RNA in transgenic pigs. Pigs 
5403 (a) and 5701 (c) were sacrificed, and RNA was prepared from 
several tissues. Upon separation in formaldehyde gels and transfer to 
nylon membranes, the RNA was analyzed with a probe specific Tor 
mouse WAP RNA. Lanes: MM, mouse mammary gland; PM, pig 
mammary gland; A. adrenals; B, brain; H, heart; K, kidney; L, liver; 
Lu, lung; Uy, lymph node; O. ovaries; Oy, oviduct; P, pituitary; S. 
salivary gland; Sp, spleen; Th, thymus; T, tongue; U, uterus; V, 
vulva. In a and c, 20 of total RNA was loaded in lanes with the 
exception of mouse mammary gland (lane MM), where 4 /ig \yas 
loaded, ib) One-hour exposure of the MM and PM lanes of a. Arrows 
indicate the position of WAP RNA. 
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permitted detection of WAP RNA levels 1000-fold lower than 
that observed. The level of WAP RNA in animal 5403 (line 
2202) was about 80% of that seen in mouse (Fig. 4a). The 
lower molecular mass band in the vulva RNA from animal 
5701 was not reproducible and probably reflects a gel or 
blotting artifact. In animal 5403 WAP expression was de- 
tected in salivary gland, although at a level of only 1% of that 
seen in mammary tissue (Fig. 4a). Low-level expression in 
the salivary gland also has been described for other trans- 
genes containing regulatory elements from milk protein genes 
(5, 20). Although the salivary gland and mammary gland have 
similar developniental patterns in that they require interac- 
tion between epithelial and mesenchymal tissue for proper 
duct formation to occur (21, 22), they are not considered 
closely related. In contrast, sebaceous glands have acommon \ 
developmental origin to that of the mammary gland. How- \' 
ever, no WAP transcripts were found in tissue taken from the 
vulva (Fig. 4), which is rich in sebaceous glands. 

DISCUSSION 

Three lines of transgenic swine containing the mouse WAP 
gene have been generated and analyzed. Although swine does 
not contain an endogenous WAP gene, its transcription 
machinery recognized the mouse WAP transgene in a tissue- 
specific manner, and mouse WAP was secreted into niilk 
from founder swine as well as their offspring at levels similar 
to those seen in mouse milk. Thus, the molecular basis for 
mammary-specific gene expression is conserved between 
swine and mouse, and it can be suggested that the mouse 
WAP gene shares mammary regulatory elements with pig 
milk protein genes. 

Expression levels of the mouse WAP genes in three lines 
of transgenic pigs described here and in three additional lines 
(unpublished data), which carry between 10 and 20 copies of 
the transgene, were consistently high and at a level compa- 
rable to the expression level of the endogenous gene in mice. 
Activity of the WAP gene in pigs appears to be relatively 
independent of the site of integration into host chromosomes 
and also independent of the gene copy number. In contrast, 
expression of the same 7.2-kb mouse WAP gene in transgenic 
mice was highly dependent on the integration site of the 
transgene (36). It remains to be determined whether the 
consistently high-level expression in transgenic pigs reflects 
special properties of the WAP gene, such as the presence of 
dominant transcription elements, or whether the pig genome 
provides a unique permissive environment for transgene 
expression. A host of other transgenic swine projects (23) 
argues against the latter explanation. Data from the sheep 
/3-lactoglobin gene (24), the rat WAP (25) and ^-casein (26) 
genes, and several hybrid genes containing mammary regu- 
latory elements (27-30) have shown that expression was 
influenced by the site of integration in transgenic mice. At a 
minimum the present study suggests that WAP gene regula- 
tion is different in mice and swine. 

This study shows that it is feasible to synthesize and 
secrete a heterologous milk protein in the milk of farm 
animals at relatively high concentrations — i.e., more than 1 
g/litcr. Clark and colleagues had shown that hybrid genes 
containing regulatory elements from the sheep ^-lactoglob- 
ulin gene are expressed in the mammary glands of transgenic 
sheep (31). However, the concentrations of the encoded 
proteins factor IX and ai-antitrypsin were only 25 /ig/liter 
and 5 mg/liter, respectively (31). With another transgene, this 
group produced human oi-antitrypsin in mouse milk at levels 
of more than 1 g/liter (20). Therefore, the ability of a 
transgene to be expressed in the mammary gland at high 
levels does not appear to be related to the nature of the 
encoded protein (milk protein versus foreign protein) but 
rather to the presence of appropriate transcription elements. 



We are currently testing the ability of the mouse WAP gene 
promoter to control expression of non-WAP structural gene 
sequences in pigs. 

The concentration of the transgene product produced in 
this study should be encouraging to those who envision using 
the mammary gland as a bioreactor for the production of 
foreign proteins as an economically viable alternative to . 
existing tissue and microbial culture systems (7, 8). Swine 
produce about 10 kg of milk per day (32), and, based on the 
expression levels discussed here, it should be possible to 
produce the protein of interest at a rate of about 1 kg per 
lactational period of 7 weeks. Since the WAP gene promoter 
is active in pigs during their entire lactational period, this 
appears to be an achievable goal, and one sow could satisfy 
current world's demand of blood clotting facor IX. Alterna- 
tively, to the dairy industry, the modification of the compo- 
sition of milk proteins themselves may be desirable so that 
overexpressing heterologous or endogenous milk proteins 
would result in novel milk products (33). 

As with other expression systems, high activity of the 
transgene could have adverse effects on the physiology of the 
mammary gland. Pigs from two lines (1301 and 2202) were 
unable to sustain lactation. In contrast, lactation persisted 
normally in female founder 1302. This animal . secreted less 
WAP into milk than those that abrogated lactation. Agalactia 
has not been observed in transgenic mice that secrete into 
their milk heterologous milk proteins (24, 34) or pharmaco- 
logically active proteins (20, 35) at levels similar to or 
exceeding those described here with swine. Experiments are 
in progress to determine whether the premature termination 
of lactation exhibited by some of the pigs is associated with 
mammary gene expression. 

Note Added in Proof. Wc have generated transgenic mice with the 
7,2-kb WAP transgene described in this paper and observed that 
some of the animals cannot maintain lactation (T. Burdon, R.J.W., 
and L.H., unpublished data). 

Wc thank Floyd Schanbacher for purified mouse WAP, Leah 
Schulman and Mark Spencer for technical assistance. Jim Piatt for 
animal care, and William Jakoby for continued support. 
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the business of xenotransplantation 

past and present 



In early 1996, an analyst at Salomon Brothers investment firm, detailed 
like never before the "unrecognised potential of xenotransplantation": a 
$6 billion market in transgenic organs by 2010. The repoj t was read by 
big and small investors alike— biotech venture capitalists (who pumped 
money into xenotransplant research), as well as newspaper personal 
investment columnists who featured companies like Imutran, Nextran, 
Alcxion, and BioTransplant as "hot picks." In these heady times, some 
companies were even suggesting that we might each have our own 
Astrids. "self pigs" custom-made from our own DNA, "immunological 
twins" available for any spare parts we might need in the course of our 
lives. 
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Some five years later, the big profits have not yet been realized. In late 
2000, a number of the original players in the xeno business reorganized 
their efforts for the next phase of research and development: Most 
notably,Novartis, the Swiss pharmaceutical giant, merged Imutran with 
BioTransplant to form a new company, Immerge BioTherapeutics; the 
new company is allied with another Novartis-funded company, Infigen, 
for use of Infigen's patented cloning technology. While this move was 
said to reflect a new commitment to xenotransplanation by Novartis, 
another company's "refocusing" seemed to start with a vote of no 
confidence for pig-to-human transplants: In August of 2000, PPL 
Therapeutics, a Scottish company that set out to commercialize the 
"Dolly" cloning technology, lost "considerable funding" for its 
xenotransplantation program from Geron, a California-based company 
that had been PPL's largest xeno backer. PPL and Geron both denied 
that the move should be construed as any kind of judgement on the 
viability of pig-to-human transplants, but PPL has had difficulty finding 
a new partner for its xeno program. 

Here's a look at the major players in the xeno business, past and present: 



imutran 



This small biotech start-up in Cambridge, England took the early lead in 
the race towaid the organ farm: In December of 1992, at a farm in 
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' Cambridgeshire, they created "Astrid," the world's first transgenic pig, 
'who earned human genes within her organs to help prevent rejection by 
the organ recipient's immune system (one of the thorniest problems 
facing xenotransplantation). A year and half later, Imutran announced it 
had produced several generations of Astrids who might be eligible for 
human trials by 1996. Started by a handful of scientists, the company 
received early funding from Sandoz pharmaceutical company whose 
profits were heavily derived from an immunosuppressive drug key to 
successful transplants. Sandoz eventually purchased Imutran outright. 
In 1996, Sandoz merged with Ciba-Geigy to fonn Novaitis. 

In January, 2001, Imutran completed the relocation of its 
xenotransplantation research to Charlestownj^MA, combining with 
BioTranspIant (another Novartis-funded company) to form Immerge 
BioTherapeutics. The company denies that the move was motivated by 
animal rights protests in the UK. 



In the early 1990*s, a small biotech company in Princeton, NJ— the DNX 
Corporation—emerged as one of Imiitran's chief rivals. Using a farm in 
Albany, Ohio, Nextran successfully produced transgenic pigs whose 
hearts survived for impressive lengths of time in baboons; they were 
also far along in developing pig li vers as filter "bridge*' organs lor 
people awaiting transplants. In late August, 1994, the Baxter Health 
Care Corp. of Deerfield IL partnered with DNX to form a new 
company— NEXTRAN-~with Baxter owning 70% oF the partnership. At 
the time of the formation of Nextran, Baxter's biggest revenue-generator 
had come from its dialysis equipment, so it took a special interest in 
DNX, which had been developing transgenic kidneys that might one 
day make dialysis less necessary. In 1995, Nextran became the first to 
win FDA approval for human clinical trials involving transgenic pig 
livers. 

In FRONTLlNE's report, a Nextran pig saved Robert Pennington's life. 
It was used outside his body as a temporary "bridge" to filter 
Pennington*s blood while he waited for a human liver transplant. 
Nextran also is involved in trying to solve the problem of hyperacute 
rejection problems facing pig-to-human transplants. 



ealexion 



Formed in 1992 by a group of Yale University scientists, Alexion was 
one of the early innovators in finding transgenic solutions to hyperacute 
rejection in transplant organs. Though initially focused on creating 
organs (their pigs were grown on farms in West Virginia and 
Massachusetts, Alexion has had some of its greatest success with 
implantation of pig nerve cells to repair spinal cord damage. In late 
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1998, Alexion made headlines worldwide for successfully repairing 
severed spinal cords in rats and monkeys using pig cells. Alexion's 
clinical trials continue. 



;^ ppl therapeutics 



Based in Edinburgh, PPL Therapeutics is licensed to commercialize the 
cloning technology pioneered by the Roslin Institute which surprised 
the world in 1997 with its creation of "Dolly," the first cloned mammal. 
In 1998, the company moved its xenotransplantation program to 
Blacksburg, West Virginia v/here scientists affiliated with Virginia 
Tech University were already involved in the\esearch. In March of 
2000, PPL's Blacksburg laboratory announced the creation of the 
world's first cloned pigs. (Later in the year, Wisconsin-based Infigen 
would be the first to clone transgenic pigs; these pigs were first shown 
nationally in FRONTLINE's report.) 

In August of 2000, PPL Therapeutics' xenotransplantation program lost 
"considerable funding" from its major backer, Geron corporation of 
Northern California, who cited a change in "strategic priorities" and a 
desire to concentrate on stem cell work. PPL executives as well as the 
director of Edinburgh's Roslin Institute issued press releases denying 
that the Geron move was a vote of no confidence for xeno: "The 
institute has had a research programme on pig cloning, one application 
of which would be the use of pig organs for xenotransplantation. While 
xeno has raised a number of well-publicised issues, such as possible 
infection with pig viruses, these Vy'ere not the basis for the decision to 
refocus the funding." ). PPL continued to look for partners through the 
Fall of 2000, but negotiations broke down, largely due to questions 
about the value of PPL's xeno program. 

In early 200 U PPL's Blacksburg, VA lab announced that it had secured 
new funding— not for xeno, but for stem cell research. It's too wsoon to 
tell whether this is one company's story, or a cautionary tale for the 
industry. 



:® biotransplant 



Founded in 1990 and taken public with a stock offering in 1996, 
BioTransplant was one of the early pioneers of xenotransplantation. 
Like their rivals, BioTransplant focused on overcoming the hyperacute 
rejection problem, basing their approach on the bone marrow research 
of Dr. David Sachs. In August, 2000, the company, which is partnered 
with Massachusetts General Hospital, announced a breakthrough in 
breeding transgenic pigs that would not transmit pig viruses, or PERV's. 

In January of 2001, BioTransplant spun-off its xenotransplantation 
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- program, partnering with Novartis (and the former Imutran) in a new 
' company, Immerge BioTherapeutics, but keeping their offices in the 
Charlestown Naval Shipyard. 



® inf igen 



Infigen was created in 1997 to commercialize the animal cloning 
techniques developed at Anierican Breeder Service (ABS Global Inc.)-- 
a DeForest Illinois company which is part of W.R. Grace. (ABS 
describes itself as "the world's leading provider of bovine reproductive 
services and technologies," a global mai^keterx)/ dairy and beef cattle 
semen.) In January of 1999, Infigen and Imut^an (Novartis) formed a 
working alliance that guaranteed Infigen's funding in exchange for use 
of the company's patented nuclear transfer cloning techniques. 

In his FRONTLINE interview,Michael Bishop PhD Infigen's president, 
explains how genetically modified pigs can be created and cloned. 



iediacrin 



Founded in 1990, Diacrin became a public company in early 1996, after 
the FDA gave the company approval for the first-ever clinical trials of 
transplanted pig cells into humans. Later in 1996, Diacrin entered a 
joint venture with Genzyme to develop two products using pig neural 
cells. . 

On March 16, 2001 Genzyme and Diacrin reported that a preliminary 
analysis of outcomes of Phase II trials for Parkinson's patients found pig 
ncuro cell transplants did not necessarily work better than a placebo 
treatment. The results are likely not the end of the research trial, but the 
news triggered a significant drop in stock prices. Jim Finn, a Parkinson's 
patient featured in FRONTLINE's report, was part of a Phase I trial. 
Other Diacrin/Genzyme Phase I patients featured in FRONTLINE's 
report— Maribeth Cook and Amanda Davis-were stroke patients. 



jimmerge biotherapeutics; 



Beginning operations in January of 2001, Immerge BioTherapeutics is a 
new company formed from the UK's Imutran and the xeno division of 
the Boston-based BioTransplant company. Unlike the companies from 
which it was formed, Immerge is focused squarely on development of 
cells, tissues, and organs for xenotransplantation, and not on drug 
therapies or other transgenics. 
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Transgenic chickens: insertion of retroviral genes into the chicken 
germ line. 

Salter DW, Smith EJ, Hughes SH, Wright SE, Crittenden LB. 

We infected early chicken embryos by injection of wild-type and recombinant 
avian leukosis viruses into the yolk of unincubated, fertile eggs. The vireniic 
males (designated generation 0 (G-0] were tested for transmission of proviral 
DNA to their G-1 progeny. Nine of 37 G-0 viremic males were mosiac and 
proviral DNA was transmitted to their progeny at frequencies varying from 1 to 
11%. All of the G-1 progeny examined by restriction enzyme analysis for 
clonality of proviral junction fragments had one to three simple but different 
fragments. The proviral DNA was transmitted from G-1 to the G-2 progeny in a 
Mendehan fashion thus proving that retroviral genes have been inserted into the 
chicken germ line. One of the viruses is a candidate vector for insertion of 
foreign genes into the chicken germ line. 

PMID: 3029962 [PubMed - indexed for MEDLINE] 
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Replication-Defective Vectors of Reticuloendotheliosis Virus 
Transduce Exogenous Genes into Somatic Stem Cells 
of the Unincubated Chicken Embryo 
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Replication-defective vectors derived from reticuloendothetiosis virus were used to transduce exogenous 
genes into early somatic stem cells of the chicken embryo. One of these vectors transduced and expressed the 
chicken growth hormone coding sequence. The helper cell line, C3, was used to generate stocks of vector 
containing about 10* transducing units per ml. Injection of 5- to 20-p.I volumes of vector directly beneath the 
blastoderm of unincubated chicken embryos led to infection of somatic stem cells. Infected embryos and adults 
contained unrearranged Integrated proviral DNAs. Embryos expressed the transduced chicken growth 
hormone gene and contained high levels of serum growth hormone. Blood, brain, muscle, testis, and semen 
contained from individuals Injected as embryos contained vector DNA. Replication-defective vectors of the 
reticuloendotheliosis virus transduced exogenous genes into chicken embryonic stem cells in vivo. 



Insertion of genetic information into the chicken provides 
a new in vivo approach to analyzing gene expression and its 
effects on avian physiology. A vector derived from Rous 
sarcoma virus has been used to transfer additional growth 
hormone genes into chicken somatic cells by infection of 7- 
and 9-day-oId embryos (35). More recently, gene transfer 
into chicken germ cells (27-29) has been accomplished by 
infection of day-bid embryos with similar replicating Rous 
sarcoma virus vectors (18, 33). This approach to avian gene 
transfer has advantages over DNA microinjection since the 
early chicken zygote is difficult to manipulate and even a 
freshly laid egg contains thousands of cells (10, 20). How- 
ever, replicating retroviral vectors have disadvantages. They 
can result in gene transfer to susceptible cells at various 
stages of differentiation long after initial infection of the 
embryo. This can make it difficult to determine the stage of 
development at which gene insertion takes place or the cell 
lineage relationships within fully differentiated tissues. Fur- 
thermore, replicating vectors also increase the potential for 
disease states associated With chronic viral infection (16, 24, 
38). 

Replication-defective retroviral vectors offer an alterna- 
tive approach (2, 6. 21, 36. 40). Such vectors, derived from 
reticuloendotheliosis virus type A (REV-A) (31), are pro- 
duced by the helper cell line C3 which contains a packaging- 
defective helper provirus (40). When transfected with a 
defective proviral vector, this helper cell assembles infec- 
tious replication-defective vector but little or no competent 
virus (17). Both replicating REV and the replication-defec- 
tive REV vector MElll have been previously used for gene 
transfer into chicken somatic cells by injection of virus into 
follicles before ovulation (32). We have used a method of 
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gene transfer based on microinjection of vector into early 
embryos. 

This report describes the transfer of new genetic informa- 
tion, including additional chicken growth hormone (cGH) 
coding sequences, into somatic stem cells of the chicken 
embryo. Chickens do not generally contain endogenous 
REV and express endogenous cGH only, in the pituitary 
during late embryogehesis and after hatching (15, 19, 30). In 
vivo, these vectors can infect somatic stem cells of day-old 
chicken embryos, resulting in precociously high levels of 
circulating cGH and the presence of vector DNA in a variety 
of adult somatic tissues. 



MATERIALS AND METHODS 

Cells. The REV-A helper cell line. C3. was generously 
provided by H. Temin (40). C3 cells were cultured in 
minimal essential medium (Eagle) containing 7% fetal calf 
scrum-400 of G418 per mi. Chicken embryo fibroblasts 
(CEF) were grown in F-10 medium supplemented with 10% 
tryptose phosphate broth-5% calf serum. D17 cells were 
cultured in minimum essential medium ( Eagle )-7% fetal calf 
serum (40). Buffalo rat liver thymidine kinase (TK)-ncgative 
(BRLtk ) cells were grown in minimum essential medium 
(Eagle) plus 7% calf serum (39). QT-6 cells were obtained 
from C. Moscovici and grown as described previously (23). 

Virus infection. BRLtk" cells were infected in medium 
containing 100 pg of Polybrene per ml. CEF were infected in 
normal medium. Cells were usually exposed to virus over- 
night. 

Vectors. The MElll vector has been previously described 
(8). The vector SW272/cGH was derived by insertion of cGH 
cDNA downstream of the 5' long terminal repeat (LTR) of 
the SW272 vector (39). 

Vector assays. TK transducing units (TKTU) released by 5 
X 10"* C3 helper cells stably transfected with vector SW272/ 
cGH were harvested after 6 h of incubation and were 
assayed by infection of 10' BRLtk" cells. TK-positivc cells 
were selected for growth in medium (40) containing 1 x lO"'* 
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M hypoxanthine» 3 x 10"* M thymidine, and 5 x 10"' M 
methotrexate. 

Plasmids and chicken DNAs. All plasmid DNAs were 
propagated by using derivatives of pBR322 and the HBlOl 
strain of Escherichia coli. The plasmid pSW272 contains a 
derivative of the spleen necrosis provirus (SN V). lacks most 
sequences encoding the structural genes of the virus, and 
contains the herpes simplex virus type 1 (HSV-1) tk gene 
(40). Chicken genomic DNAs were isolated from Arbor 
Acres males of meat breeding lines. 

Nucleic acid isolation. Chicken embryo DNA was prepared 
by solubilizing tissue in buffer containing 100 mM EDTA, 
1% sodium dodecyl sulfate, 100 p.g of proteinase K per ml 
(pH 8). Samples were incubated at 60''C for 15 min, then at 
37*'C with additional protease (100 p-g/ml) for 4 h. The DNA 
was sheared, adjusted to 200 mM NaCl. and extracted twice 
with equal volumes of phenol and chloroform-isoamyl alco- 
hol (24:1) and once with 2 volumes of chloroform-isoamyl 
alcohol. DNA was ethanol precipitated and dissolved in 0.01 
M Tris-0.001 M EDTA (pH 8.0), Unsheared DNA was used 
for Southern blot analysis (34). 

Nucleic acid analysis. DNA samples were applied to Gene 
Screen Plus membranes (New England Nuclear Co.) for dot 
blot analysis by means of 96-well plexiglass manifolds. DNA 
on membranes was denatured in 1.5 M NaCI-0.5 M NaOH 
for 15 min. neutralized in 0.5 M Tris(pH 7.5)-1.5 M NaCl for 
1 min, blotted dry, and baked at 80*C for 30 min. Hybrid- 
izations were carried out as already described (17). Radio- 
labled DNA probe was prepared by the method of random 
priming (13). Southern blot analysis was performed as de- 
scribed previously (34). 

cGH analysis. cGH expression was analyzed cither by 
radioimmunoassay (RI A) (35) or by Western immunoblotiing 
(3). 

Transfection. The REV-derived helper cell line. C3, was 
transfectcd as previously described (14) with the plasmids 
pSV/272/cGH and pHyG (37). Transfected cells were se- 
lected for 10 to 14 days in medium containing 200 \i.g of 
hygromycin per ml. 

Embryo inrection. Shell was removed from the area above 
the blastoderm of unincubatcd eggs. A Narishigi microma- 
nipulator and a 25-p.l Drummond pipette fitted with a glass 
needle were used to inject 5- to 20-plI volumes of cell culture 
medium containing vector directly beneath the exposed 
blastoderm. The titer of vector was about 10"* TKTU/ml as 
measured on BRLtk~ cells. The relative titer of this vector 
on chicken embryo cells in vivo is unknown. Eggs were 
rescalcd with a patch of shell membrane which was covered 
with Devcon Duco cement and allowed to dry. Eggs were 
incubated at 37.8*'C, 

RESULTS 

Vectors MElll and SW272/cGH. The sequence relation- 
ships among SNV, MElll, the cGH transducing vector 
SW272/cGH. and the packaging-defective helper proviruses 
present in C3 helper cells are shown in Fig. 1. MElll has 
been described in detail elsewhere (8). The parental vector 
SW272 is derived from SN V and contains the HSV-1 tk gene 
and promoter in the same transcriptional orientation as the 
viral promoter (39). The cGH coding sequence was originally 
derived from a cDNA clone made from chicken pituitary 
mRNA (35). A DNA fragment Xbal to Ncol contains the 
complete coding sequence of the cGH gene but lacks the 
poly(A) addition signal present at the 3' end of the cDNA. 
Using Klenow reagent and blunt-end ligation, the cGH 
sequences were inserted into the unique Xbal site within 



pSW272 located just downstream of the viral 5' splice donor 
and packaging sequence. 555 nucleotides from the 5' end of 
the viral RNA transcript (39). The orientation of the cGH 
coding sequence is the same as that of the viral sequences. 
Proceeding from the 5' end of the proviral RNA transcript of 
SW272/cGH, the first ATG encountered codes for the N- 
terminal methionine of cGH. SW272/cGH is designed to 
express cGH mRNA transcripts from the viral promoter. 

Transduction and expression of REV vectors in vitro. 
Careful screening of the C3 helper cells transfected with 
pSW272/cGH and pHyg yielded clone C3-44 which released 
2 X 10"* TKTU/ml into growth medium but very low levels of 
competent virus. Competent REV in these cultures, as 
^timaled by infection of cultured CEF, was about 10 
infectious units of REV per ml or less (17). Western blot 
analysis of cGH released by C3-44 cells revealed a predom- 
inantly single band of protein which comigrated with purified 
recombinant cGH (Fig. 2). The observed molecular size of 
cGH was about 23.000 daltons. The estimated concentration 
of cGH in a 72-h harvest of medium of clone C3-44 was at 
least 500 ng/ml (data not shown). CEF infected with vector 
released >40 ng of cGH per ml of growth medium as 
determined by RIA 3 days after infection (data not shown). 
Western blot analyses of cGH released by cell lines infected 
with the SW272/cGH vector are shown in Fig. 2. lanes 13 
through 18. Cell lines B56 and B20 derive from the canine 
cell line D17. Cell lines QT82, QT54. QT15, and QT8 derive 
from the quail cell line QT-6. All of these cells release cGH 
having the same apparent molecular size as purified recom- 
binant cGH (23 kilodalions) (35). Approximate levels of cGH 
expression varied from 2 to 10 ng/ml. 

Analysis of DNA from chicken embryos after vector infec- 
tion. Tissue culture fluid (20-ixl volumes) containing the 
vector SW272/cGH was injected beneath the blastoderms of 
unincubated chicken embryos. Total embryonic DNA was 
isolated from vector-injected and uninjected control em- 
bryos after 7 days of development and was analyzed by 
qualitative dot blot hybridization with either a radiolabeled 
cGH probe (Fig. 3A) or a REV vector probe (Fig. 3B). The 
cGH probe was used to demonstrate that sufficient DNA was 
present on the filter for detection of vector sequences 
present at low copy number. Of 25 injected embryos, 13 
(52%) hybridized to a radiolabeled probe of vector DNA, 
whereas control DNA from uninjected embryos did not. 

To confirm the presence and correct genome organization 
of vector sequences in infected 7-day embryos, high-molec- 
ular-size DNAs from 10 vector-containing embryos were 
digested with BarnHl endonuclease and subjected to South- 
ern blot analysis (34) (Fig. 4). The embryo DNAs examined 
included those from Fig. 3B, rows la, 2a, 6a, 7a, and 8a. 
Internal BamHl fragments predicted from the cGH vector 
sequence are diagrammed in Fig. 1. Digestion of integrated 
proviral vector sequences of SW272/cGH should yield DNA 
fragments internal to the provirus of 0.86, 2.3. and 1.6 
kilobase pairs (kb). A 5' junction fragment containing the 5' 
LTR of the vector linked to host cellular sequences adjacent 
to the integration site might also be delected. No 3' junction 
fragment containing host DNA sequences would be de- 
tected, because a BamHl restriction endonuclease site is 
located at the 3' end of the proviral LTR. As shown in Fig. 
4A, lanes 3 to 7 and 12 to 16, DNAs from these vector- 
infected embryos show the expected BamHl DNA frag- 
ments of 0.86. 2.3, and 1.6 kb when analyzed with a probe 
derived from the complete SW272 plasmid DNA. which does 
not contain cGH sequences. The absence of detectable 
BamHl fragments containing the junction of cellular DNA 
and integrated vector DNA indicates multiple sites of vector 
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FIG. 1. Sequence relationships among the parental SNV provirus. the modified packaging-defective helper proviruses. and the vectors 
MElll and SW272/cGH arc shown. Relevant features of these proviruses include the LTRs. the structural genes of the virus (/?a^, poL env), 
the approximate position of the packaging sequence (E). the cGH sequences, the HSV-1 tk gene promoter (TKp), the tk coding sequence 
(TK). and the neomycin phosphotransferase coding sequence (NEO). The (c/rr) sequence in the larger of the two helper proviruses is 
presumably not expressed because of removal of the 5' splice donor. Overlapping deletions indicated between helper and vector sequences 
should reduce recombination between these genomes. A description of the REV helper proviruses and the original TK transducing vector 
pSW272 and MElll have been given (8. 40). The 5* LTRs of both helper proviruses derive from SNV, Their coding sequences derive from 
REV-A, The env helper provirus lacks viral splice donor and acceptor sequences. The first ATG is that of the env gene. The cGH vector 
derives from SNV. REV-A and SNV share high sequence homology. Relative sizes (in kilobases) of BumWX restriction cndonuclease 
fragments are indicated. Also given are the locations of viral, vector. TK, and cGH DNA probes. 



provirus integration during infection of early embryonic 
cells. No 0.57-kb BamWX fragment predicted from the struc- 
ture of unintegrated circular forms of either the vector DNA 
or helper virus DNA was observed. No 1.4-kb fragment 
diagnostic of the 5' end of integrated replication-competent 
proviral SNV DNA was observed (39) (sec Fig. 1). BumWX- 
digested DNA from uninjected whole embryos or from blood 
of uninjected chickens did not hybridize to the vector probe 
(Fig. 4A, lanes 2, 8, 11. and 17, respectively). 

After removal of the SW272 probe (Fig. 4B). the same 
filters were hybridized with a viral probe specific for the 
structural genes of REV to delect the presence of replica- 
tion-competent virus (Fig. 4C). The parental SNV and 
REV-A proviruses used to derive the helper cell and vectors 
described here contain internal BamYiX fragments of 1.4, 1.8. 



2.2, 0.7. and 1.6 kb (see Fig. 1). Only the 1.6-kb fragment 
would not be detected by the virus-specific probe (Fig. 1) 
used in this analysis. No virus-specific BamW\ fragments 
were observed, indicating that endogenous and exogenous 
REV sequences were not detectable (Fig. 4C). Although this 
result does not rule out the presence of competent helper 
virus, it shows that efficient gene transfer takes place via the 
replication-defective SW272/cGH vector. The dot blot on 
the right of panel C contains various quantities of plasmid 
pSW253 which carries the entire REV provirus (5). 

The filters shown in Fig. 4C were washed to remove probe 
(Fig. 4D) and were reanalyzed with a cGH-specific probe 
(Fig. 4E), The fragments of 0.86 and 2.3 kb in lanes 3 to 7 and 
12 to 16 are the predicted cGH-containing vector sequences 
described in Fig. 1. The two bands (asterisks) of approxi- 
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FIG. 2. Western blot analysis of cGH. Lanes: 1 to 3, 250, 125, 
and 30 ng per lane, respectively, of purified recombinant cGH; 4. 10 
^1 of conditioned QT-6 medium; 5 to 9. immunoprecipitated cGH 
from various volumes of serum from 15-day vector-injected em- 
bryos, birds 18 (80 fil). 2 (100 fil), 6 (60 30 (100 and 27 (200 
jil). respectively (see Table 1): 10 to 12, sample buffer alone; 13 to 
18, 10 p.1 of conditioned medium from clones of SW272/cGH- 
infcctcd D17 and QT-6 cells (clones B56, B20. QT82, QT54, QT15. 
and QT8, respectively). Molecular weights (MW x 10') arc shown at 
the left. 



mately 6.4 kb and approximately 2.7 kb, which arc common 
to all lanes, represent BamVLl fragments derived from the 
endogenous cGH gene. As expected, embryo DNAs in lanes 
3 to 7 and 12 to 16 contain all four fragments derived from 
both the vector and endogenous gene. The 1.6-kb BamHl 
fragment present in lanes 3 to 7 and 12 to 16 of Fig. 4A is 
missing in Fig. 4E, because this fragment does not contain 
cGH sequences. 

Dot blot hybridization of DNA from brain, liver, and 
muscle of four 14-day embryos infected before incubation 
showed that two of the four embryos contained vector- 
specific sequences in all three tissues. One embryo con- 
tained vector sequences in liver and muscle only, and one 
embryo was negative (Fig. 3). 

Analysis of scrum cGH. Circulating levels of cGH were 
determined by RIA of serum from thirty 15-day-old embryos 
infected with vector before incubation (Table 1). Concentra- 
tions of cGH in serum from 16 of 30 injected embryos (55%) 
were at least 10 times the level in unir\jected control em- 
bryos, and they ranged from 18 to 254 ng/ml. All 35 control 
embryos contained less than 2 ng of detectable serum cGH 
per ml. Western blot analysis of cGH immunoprecipitated 
from serum of a number of these embryos is shown in Fig. 2, 
lanes 5 to 9. The amount of cGH present in serum from 
infected embryos is similar to the amount of cGH produced 
in vitro by infected culture cells. 

Vector sequences In adult chickens. Southern blot analysis 
of DNA isolated from blood, brain, muscle, and testis of an 
adult chicken (no. 87725) which had been injected as an 
embryo with the MElll vector is shown in Fig. 5. DNAs 
were digested with BamHl and Bglll before analysis. The 
four different probes used hybridized with the REV se- 
quence present in the vector. HSV-1 tk sequences of the 
vector, REV structural gene sequences (absent from the 
vector), or endogenous cGH genes. All analyzed DNAs from 
bird 87725 contained the predicted DNA fragments of 0.74 
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FIG. 3. Dot blot hybridization analysis of chicken embryo DNA. 
(A) Chicken embryo DNA hybridized with a cGH probel Rows: a, 
b, and cl, total DNA from 7-day embryos injected with 20 |xl of 
vector C3-44; c2 to cl2 and dl to d7 and d9. DNA from uninjected 
control embryos; dlO to dl2, cl to c3. c4 to c6 and c7 to c9, DNA 
from brain, liver, and muscle of four 14-day embryos injected with 
vector SW272/CGH; fl to f4 chicken blood DNA (-15 »xg) mixed 
with l/IO dilutions of vector DNA starting with 1 ng in spot fl; f5. 
chicken blood DNA only; gl to g4. yeast tRNA (5 p.g) mixed with 
the same arpounts of vector DNA present in rows fl to f4; g5, yeast 
tRNA only. (B) Same as in panel A, except filters were hybridized 
with a vector-specific probe (sec Fig. 1), Row el2 is the same as row 
al. Approximately 15 to 30 jxg of total embryo DNA was applied to 
each spot, using a 96-well blotting apparatus. 

and 1.6 kb recognized by the REV vector probe and frag- 
ments of 1.2 and 1.7 kb recognized by the tk probe (Fig. 5 A 
and C. respectively). DNA from blood and brain contained 
additional hybridizing fragments which probably include 
junctions between vector and cellular DNA at sites of 
integration (Fig. 5A, lanes 2 and 3). No REV-specific bands 
were observed in any of these tissue DNAs (Fig. 5B). 
Hybridization with cGH probe revealed endogenous frag- 
ments of -2.7 and -6.4 kb (Fig. 5D). D17 cells cocultivated 
with blood taken from bird 87725 at 4 weeks of age were 
reverse transcriptase-negative after 4 weeks of culture and 
did not produce detectable ik gene-transducing activity. Of 
14 similarly derived birds, 2 were virus positive as deter- 
mined by the same assay (17). Although the presence of low 
levels of replicating REV in birds like no. 87725 cannot be 
ruled out, these results are consistent with infection of 
embryonic stem cells with nonreplicating REV vectors. 

Southern blot analysis of DNA from semen and blood of 
SW272/cGH-posilive and control birds is shown in Fig. 6. 
Filters containing JSa/nHI-digested DNAs were hybridized 
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FIG. 4. Southern blol analysis of DNA from 7-day chicken embryos iryecied with 20 p.1 of SW272/cGH vector before incubation. 
High-molecuIar<size DNA (15 p.g) was digested wlih BamHl before analysis. The same filler was hybridized to three diGfercnt probes: pSW272 
probe (A), probe removed from panel A (B). virus-specific probe (Cj. probe removed from panel C (D), and cGH-specific probe (E). Probe 
hybridized to vector DNA in lanes 9 and 10 of panel A could not be compleiely removed. Sequences recognized by these probes are illustrated 
in Fig. 1. Lanes: 1 and 18, ////idlll-digesled lambda phage DNA, //welll-digesied <t>X174 DNA, and ^a/nHI-digeslcd uninjected chicken blood 
DNA; 2 and 11, DNA from uninjected embryos; 8 and 17, DNA from blood of uninjected chickens; 3 to 7 and 12 to 16. DNA from 
vector-injected embryos; 9 and 10. Bo/nH I- digested DNA of pSW272/cGH (1 ng) plus uninjected chicken blood DNA. BamHl fragments 
internal to the proviral vector arc marked with arrows in panel A. BamHl fragments containing the endogenous cGH sequence are marked 
by asterisks in panel E. Dot blot on the right of panel C contains the indicated amounts of pSW253 containing the REV-A provirus (5). Sizes 
are shown in kilobase pairs (Kb). 
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FIG. A — Contlinu-d, 



with radiolabeled DNA probes lor the cGH coding se- 
quence. 5' and 3' vector-specific sequences, or REV virus 
probe (see Fig. 1). Lane 1 of each panel contains a mixture 
of bacteriophage lambda and 4>X174 DNAs digested with 
Hin<\\\\ and HaeXW, respectively, and i3r/mHI-digested con- 
trol chicken blood DNA. Lane 9 in Fig. 6A to C contains 
/?«mH|.digested control chicken blood DNA and plasmid 
DNA of pSW272/cGH. Internal BamWl fragments of vector 
DNA arc indicated by arrows. BaniWX fragments derived 
from the endogenous cGH gene are shown by the asterisks. 
Internal fragments derived from the complete REV provirus 
are marked by chevrons. Results obtained by hybridization 
with the cGH-specific probe are shown in Fig, 6A. BcuuHX- 
digested DNAs from control semen and blood (lanes 3 and 6, 
respectively) contain endogenous fragments of -2.7 and 
--6.4 kb. In lanes 4 and 7. DNAs from the semen and blood, 
respectively, of the vector-positive male contain an addi- 
tional 0.86-kb fragment which derives from the vector 
SW272/cGH and hybridizes to the cGH probe. Although 
visible in the original autoradiogram. the 2.3-kb BaniHX 
fragment derived from the vector is not well resolved from 
the strongly hybridizing 2.7-kb BaniHX fragment derived 
from the endogenous cGH gene. Blood DNA appears to 
contain much less of the 0.86-kb fragment than does semen 
DNA. 

Panel B of Fig. 6 shows results obtained when a similar 
filter was hybridized with the vector probe. Lanes 4 and 7 
show that semen and blood DNA from an infected male 



contain BaniYXX fragments of 0.86 and 1.6 kb. These frag- 
ments derive from the 5' and 3' ends of the integrated vector 
DNA. respectively. The additional 2,3-kb internal BaniHX 
fragment of vector DNA containing HSV-1 ik sequences 
does not hybridize to the vector probe used in Fig. 6B nor 
does ^f///»Hl-digestcd DNA from semen and blood of unin- 
fected control birds (lanes 3 and 6). No 1.4-kb fragment 
characteristic of replicating REV was observed. The pat- 
terns of semen and blood Bam\\\ DNA fragments hybridiz- 
ing with these probes are similar to each other and are 
consistent with the pattern observed in jff«mHI-digested 
DNA from infected embryos (Fig. 4). 

Panel C of Fig. 6 shows results of hybridization with a 
virus-specific probe. Lanes 1 to 9 are as described for panels 
A and B. Lane 10 is blank. Lanes 11 and 12 contain 
^^///^HI-digosted plasmids pS\V279 (39) and pSW253 (5). 
respectively. DNA in lane 11 has a 5' LTR derived from 
SN V (with a BcuuHX site) but a 3' LTR derived from REV-A 
(without a BaniHX site). This provirus also lacks a 310- 
base-pair packaging sequence (E) located near the 5' end of 
the provirus. The expected fragments generated by BaniHX 
digestion of this DNA are present at --1.1 (E~). —1.8. —2.2. 
and -0,7 kb (visible at longer exposure limes). Plasmid 
pSW253 in lane 12 contains the REV-A provirus and lacks 
the BamHl site present in the SNV LTR. BaniHX digestion 
of this DN.A generates the observed —1.8- and — 2.2-kb 
fragments. The large fragment of —9 kb in lane 12 contains 5' 
and 3' portions of the provirus and a portion of the gafi gene 
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TABLE 1. cGH levels in chicken embryo scrum" 



C3-44-ifOectcd embryos Uninjcctcd embryos 



Bird no. 


Amt (ng/ml) 
or COM 


Bird no. 


Amt (ng/ml) 
oi curl 


1 


51 


31 


<0.80 


2 


180 


32 


<0.80 


3 


100 


33 


<0.80 


4 


0.9 


34 


<0.80 


5 


41 


35 


<0.80 


6 


200 


36 


0.85 


7 


2.6 


37 


<0.80 


8 


80 


38 


<0.80 


9 


44 


39 


<0.80 \ 


10 


106 


40 


<0.80 \ 


11 


4.5 


41 


<0.80 


12 


18 


42 


1.2 


13 


8.6 


43 


1.0 


14 


1-1 


44 


<0.8 


15 


0.92 


45 


<0.8 


16 


2.2 


46 


<0.8 


17 


0.8 


47 


<0.8 


18 


254 


48 


<0.8 


19 


10.8 


49 


1.1 


20 


240 


50 


1.2 


21 


168 


51 


1.2 


22 


32 


52 


0.9 


23 


56 


53 


1.2 


24 


12 


54 


0.9 


25 


42 


55 


<0.8 


26 


0.86 


56 


<0.8 


27 


0.70 


57 


<1.1 


28 


3.4 


58 


<0.8 


29 


1.4 


59 


<0.8 


30 


0.76 


60 


<0.8 






61 


<0.8 






62 


<0.8 






63 


<0.8 






64 


' <0.8 






65 


<0.8 



Embryos of unincubaicd eggs were injcclcd v/ith 10 of medium from 
cultures of clone C3-44. After 15 days of incubation, serum from each embryo 
was assayed by RIA for cGH. 



sequence. The 0.7-kb fragment is observed at longer expo- 
sure times. DNAs in lanes 3, 4, 6, 7. and 9 do not contain 
sequences detectable with probe derived from the structural 
genes of REV. 

DISCUSSION 

Early chicken embryo development. Fertilization and the 
first 24 h of chicken embryonic development occur in the 
oviduct and uterus, concomitant with the accretion of albu- 
men and deposition of the eggshell. During this period, 
attempts at gene transfer into the embryo must allow for 
surgical removal after fertilization and either reintroduction 
to the oviduct or extensive artificial culture (25, 26). Both of 
these approaches are technically difficult. Alternatively, 
infection of the embryo just after ovaposition represents a 
strategy well suited to vector-mediated gene transfer. The 
embryo at this stage is composed of at least 10,000 cells 
arranged in a disk-shaped blastoderm, one to two cells thick 
and 2 to 3 mm in diameter (10. 20). The day-old blastoderm 
floats on the yolk above a fluid-filled subgerminal cavity. 

Previous studies have provided a detailed description of 
early chicken embryo development (10, 20) and insights 
regarding the developmental potential of cells comprising the 
embryonic blastoderm of a freshly laid egg (9. 11. 12, 22). 
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Separated posterior and anterior portions of very young 
unincubated blastoderms appear totipotent, with similar 
ability to form embryos in vitro (10). The slightly older 
blastoderm exhibits cells of both upper epiblastic and lower 
hypoblastic layers. Separated from the lower layer of cells, 
the upper epiblastic layer retains its pluripotency, regener- 
ates a new hypoblastic layer, and can subsequently form an 
early-stage embryo in vitro. When dissociated and grown in 
culture, epiblast cells form structures resembling cmbryoid 
bodies formed by murine teratocarci nomas (22). The hypo- 
blastic layer, in contrast, survives but does not form embry- 
onic structures (22). 

All of the above observations suggest that successful 
infection of the early blastoderm with REV vectors might 
result in gene transfer into pluripotent embryonic stem cells. 
However, the REVs are primarily exogenous viruses in the 
chicken (41). Even though infected dams can transmit the 
virus vertically to their ofifspring by shedding virus into the 
egg (42), nucleic acid sequences closely related to REV are 
not endogenous to the chicken genome (Fig. 4C). The 
biology of virus-host interactions may preclude stable inser- 
tion of REV sequences into the chicken genome under 
natural conditions. Insertion of complete REV proviruses 
into the chicken genome could adversely aflfect viability, but 
even defective proviruses appear to be absent from the 
chickens analyzed in this study. 

Infection of unincubated chicken embryo blastoderms. We 
have used replication-defective REV vectors MElll and 
SW272/cGH to test the feasibility of retrovirus-mediated 
gfene transfer in the chicken. The C3 helper cell line has been 
used to generate titers of about 10** infectious units per ml. 
The MElll vector carries the Tn5 neomycin phosphotrans- 
ferase gene and the HSV-1 tk gene and has been described 
previously (8). The vector SW272/cGH carries a cDNA 
sequence encoding the cGH mRNA and the HSV-1 tk gene 
(see Fig. 1). Clone C3-44 released about 10^ TKTU/ml and 
expressed about 500 ng of cGH per ml of culture medium. 
Analysis by RIA (35) (data not shown) and Western blotting 
(3) (Fig. 2) showed that cGH released by C3-44 and trans- 
duced by SW272/cGH is similar to natural cGH. 

Glass needles (40 to 60 p.m diameter) were used to deposit 
medium containing vector directly above and below the 
surface of the unincubated embryonic blastoderm. This 
method resulted in successful transduction of vector se- 
quences into recipient embryos. Estimates of the amount of 
vector injected into the space beneath the blastoderm are 
based on the titer on BRLtk" cells (-10^ TKTU/ml) and the 
observation that REV titer on chicken cells could be 10- to 
100-fold higher (39). We estimate that between 10' and 10^ 
TKTU were injected per embryo. 

Vector DNA present in 7-day embryos. Dot blot analysis of 
7-day embryo DNA shown in Fig. 3 indicated that about 50% 
of injected embryos contained detectable vector sequences. 
Three different radiolabeled probes were used in Southern 
blot analysis of high-molecular-mass DNA from 10 embryos 
to distinguish the REV structural genes, the SW272/cGH 
vector, and endogenous cGH sequences from each other 
(see Fig. 1). Since most infected embryonic cells are likely to 
have a single copy of vector, these blots indicate that a 
significant percentage of the embryonic cells may carry 
vector sequences 7 days after infection. This is most evident 
in comparisons of endogenous and vector-specific BamHl 
fragments hybridizing to the cGH probe (Fig. 4E). The lack 
oT BomHl fragments specific for replicating REV (Fig. 4C) 
confirms that gene transfer is primarily the result of the 
replication-defective REV vector and not of contaminating 
helper virus (17). These results show that early embryonic 
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cells are susceptible to REV infection and that ihcy persist 
during development, comprising a significant fraction of the 
7-day embryo. 

Expression of cGH in vector-injected embryos. Expression 
of the endogenous cGH gene occurs laie during embryonic 
development. Caudal .cells of the pituitary do not contain 
immunodeiectabic cGi-l until day 12 of embryonic develop- 
ment (19), whereas detectable plasma cGH does not appear 
until day 17 of incubation (15), Furthermore, response of the 
pituitary to the cGH secretagogue, thyrotrophin-releasing 
hormone, is not seen until hatching (7). The absence of endog- 
enous REV and the restricted location and timing of endoge- 
nous cGH expression facilitate the distinctions between 
endogenous and vector-encoded genes and their products. 

Expression of the cGH gene in vivo resulted in elevated 
serum cGH levels in about 50% of injected embryos when 
measured after 15 days of development (Table 1). Levels of 
serum cGH in 30 injected embryos varied from <1 ng/ml to 
254 ng/ml, whereas none of the 35 uninjected controls had 



serum cGH levels above 2 ng/ml. Immunoprecipitated serum 
cGH from infected embryos comigratcd with purified recom- 
binant cGH as shown by Western blot analysis (Fig. 2). The 
relative contribulion of somatic tissues to circulating levels 
of cGH is not known. These results are consistent with 
infection of embryonic stem cells present in the blastoderm 
at the time of vector injection and expression of vector- 
encoded cGH. 

Vector DNA in tissues of adult mates. Southern blol anal- 
ysis of DNA from an adult male injected as an embryo with 
MElll demonstrated the presence of vector in blood, brain, 
muscle, and testes (Fig. 5). Analy.sis of .semen DNA by 
Southern blotting confirmed the presence of integrated un- 
rearranged vector sequences in a low percentage of the 
sperm cells from a bird injected with SW272/cGH (Fig. 6). 
The pattern of Bamhil restriction fragments observed (0.86 
and 1.6 kb) is consistent with that seen in Southern blol 
analysis of DNA from infected embryos. Probe containing 
HSV-1 rk sequences revealed the additional 2.3-kb BamHX 
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vcclor DN A fragment (data not shown) also seen in Fig. 5A. 
Vector sequences present in semen are not caused by 
contaminating blood cells since blood contains lower levels 
of vector DNA per microgram of total DNA, as shown by 
Southern blotting. Furthermore, blood cells were not de- 
tected in vector-positive semen subjected to microscopic 
examination nor could vector sequences be detected in 
negative control semen containing 1% vector-positive blood 
from a different bird. We did not observe any consistent 
BamHl fragments representing junctions between cellular 
and vector sequences, probably because of the polyclonal 
makeup and low overall percentage of cells carrying the 
vector. No 1.4-kb BamHl DNA fragment characteristic of 
replicating SNV was observed (40) (see Fig. 1). 

Previous work with replication-competent derivatives of 
Rous sarcoma virus showed that infection of unincubated 
chicken embryos resulted in germ line insertion of proviral 
DNA (28, 29). In contrast, the same approach using compe- 
tent REV resulted in somatic infection but did not lead to 
germ line insertion of proviral DNA (29). Similarly, follicular 




^0 ^■ 

FIG. 6. Southern blot analysis of BamHI-digcstcd DNA from 
semen and blood of SW272/cGH-infected and control chickens. (A 
and B) Duplicate blots hybridized with radiolabeled probe specific 
for cGH or 5' and 3' regions of the vector, respectively. Sec the 
legend to Fig. 1 for a detailed description of probes. Lanes: 1, 
//mdlll-digcstcd lambda phage DNA. //^/elH-digcsted 4>X174 DNA. 
and BnmHI-digestcd control chicken blood DNA; 3. control semen 
DNA: 4. SW272/cGH-infecled G© semen DNA; 6, control blood 
DNA; 7. SW272/cGH-infccted G© blood DNA; 9. 5a/wHI-digestcd 
plasmid pSW272/cGH (20 pg) and control chicken blood DNA. 
Arrows, internal BamHl fragments of vector DNA; asterisks, 
endogenous BamHl fragments derived from the cGH gene. (C) 
Hybridization with virus-specific probe. Lanes: 1 to 9, same as in 
panels A and B; 10. blank; 11 and 12, fiamHl-digested plasmids 
pSW279 and pSW253. which carry a hybrid SNV/REV-A provirus 
and REV-A provirus. respectively. . 



injection of REV vectors resulted in the presence of vector 
sequences in somatic cells, but germ line transmission of 
these sequences was not demonstrated (32). We recently 
confirmed germ line transmission of the replication-defective 
REV vector MElll administered as described here to 
chicken embryos (1). Breeding studies are now in progress to 
determine whether semen from the chickens infected as 
embryos with defective REV vectors encoding cGH can 
accomplish germ line transmission of the vector DNA to 
progeny. 

Conclusion. Replication-defective REV vectors can intro- 
duce new genetic information into the chicken by infecting 
somatic stem cells of the embryo. Susceptibility of these 
stem cells to infection by REV vectors provides another 
approach to the in vivo study of avian development (4) and 
vector-mediated gene expression. The possible applications 
of this technology are numerous. 
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Transgenic production of a variant of human tissue-type 
plasminogen activator in goat milk: generation of transgenic goats 
and analysis of expression. 

Ebert KM, Selgrath JP, DiTuUio P, Denman J, Smith TE, Memon MA, 
Schindler JE, Monastersky GM, Vitale JA, Gordon K. 

Tufts University School of Veterinary Medicine, North Grafton, MA 01536- 
1895. 

We report the first successful production of transgenic goats that express a 
heterologous protein in their mili<. The production of a glycosylation variant of 
human tPA (LAtPA-longer acting tissue plasminogen activator) from an 
expression vector containing the murine whey acid promoter (WAP) operatively 
hnked to the cDNA of a modified version of human tPA was examined in 
transgenic dairy goats. Two transgenic goats were identified from 29 animals 
bom. The first animal, a female, was mated and allowed to carry the pregnancy 
to term. Milk was obtained upon parturition and was shown to contain 
enzymatically active LAtPA at a concentration of 3 micrograms/ml. 

PMID: 1367544 [PubMed - indexed for MEDLINE] 
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Transgenic expression of a variant of human tissue-type 
plasminogen activator in goat milk: purification and 
characterization of the recombinant enzyme. 

Denman J, Hayes M O'Day C, Edmunds T, Bartlett C, Hirani S, Ebert 
KM, Gordon K, McPherson JM. 

Genzyme Corporation, Framingham, MA 01701. 

A glycosylation variant of human tissue-type plasminogen activator (tPA) 
designated longer-acting tissue-type plasminpgen activator (LAtPA) was 
extensively purified from the milk of a transgenic goat by a combination of acid 
fractionation, hydrophobic interaction chromatography and immunoaffinity 
chromatography. This scheme provided greater than 8,000-fold purification of 
SSi'™?"!' '^""^"'^ti^^ yield of 25% and purity greater than 98% as judged by 
SDS gel electrophoresis. SDS gel electrophoresis revealed that the transgenic 
enzyme was predominantly the "two chain" form of the protease. The specific 
activity of the purified transgenic protein, based on the average of the values 
obtained for three different preparations, was 610,000 U/mg as judaed by 
amidolytic activity assay. This was approximately 84% of the value observed 
for the recombinant enzyme produced in mouse C127 cells. Analysis of the 
transgenic protein indicated that it had a significantly different carbohydrate 
composition from the recombinant enzyme produced in C127 cells. Molecular 
size analysis of the oligosaccharides from the transgenic and C127 cell-derived 
LAtPA preparations confirmed their differences and showed that the mouse 
cell-denved preparation contained larger, complex-type N-linked 
oligosaccharide structures than the material produced in goat mammary tissue. 

PMID: 1367545 [PubMed - indexed for MEDLINE] 
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Fig. 4. (Left) YUCCA is involved in tryptophan-dependent auxin biosynthesis, and the YUCCA 
pathv/ay is functional in other plants. (A) yucca is less sensitive to toxic tryptophan analogs. 
Wild-type (left) and yucca seedlings were grown on 0.5X MS medium containing 100-jjlM 5-mT for 
10 days. (B) Comparison of wild-type (left) and transgenic tobacco plants overexpressing YUCCA, 
Fig. 5. (Right) YUCCA catalyzes a key step in auxin biosynthesis. Putative tryptophan-dependent 
auxin biosynthesis pathways and intermediates, are shown (2). The indole-3-acetat- 
doxime intermediate was proposed recently (25). 



may yield additional clues that can be used to 
elucidate liic physiological roles of their 
iiKunmalian counterparts. 
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Transgenic Monkeys Produced 
by Retroviral Gene Transfer 
into Mature Oocytes 

A. W. S. Chan, K. Y, Chong, C. Martinovlch, C. Stmerly, C. Schatten* 

Transgenic rhesus monkeys carrying the green fluorescent protein (GFP) gene 
were produced by injecting pseudotyped replication-defective retroviral vector 
Into the perivitelline space of 224 mature rhesus oocytes, later fertilized by 
intracytoplasmic sperm injection. Of the three males born from 20 embryo 
transfers, one was transgenic when accessible tissues were assayed for trans- 
gene DNA and messenger RNA. All tissues that were studied from a fraternal 
set of twins, miscarried at 73 days, carried the transgene, as confirmed by 
Southern analyses, and the GFP transgene reporter was detected by both direct 
and indirect fluorescence imaging. 



Although transgenic mice have been invalu- 
able in accelerating the advancement of bio- 
medical sciences (7-5), many differences be- 
tween humans and rodents have limited their 



usefulness {6-9). The major obstacle in pro- 
ducing transgenic nonhuman primates has 
been the low efficiency of conventional gene 
transfer protocols. By adapting a pseu- 
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Fig. 1. Injection of VSV-G pseudotyped retro- 
viral vector, enclosing the GFP gene and pro- 
tein, into the perivItelUne space of mature rhe- 
sus oocytes. (A) Transmitted light and (B) epi- 
fluorescence imaging of GFP carried within the 
vector particles. Magnification: X100. 

dolyped vector system, efficient at up to 
100% in cattle (JO, J J), we circumvented 
problems in traditional gene transfer method- 
ology to produce transgenic primates. 

We injected 224 mature rhesus oocytes 
with high titer [10^ to IQ*' colony-forming 
units (cfu)/ml] moloney retroviral vector 
pseudotyped with vesicular stomatitis virus 
envelope glycoprotein G (VSV-G pseu- 
dolype) into the peri vitelline space (Fig. I; 
Table 1; 10-12), The VSV-G pseudotype 
carried the GFP gene under the control of 
cither the cytomegalovirus early promoter 
(CMV) (referred to as LNCEGFP-( VSV-G)] 
or the human elongation factor- 1 alpha pro- 
moter (hCF-Ia) (referred to as LNEFEGFP- 
(VSV-G)] (/J). Because -10 to 100 pi was 
introduced into the pcrivitcllinc space, be- 
tween I and 10 vector particles were intro- 
duced using LNCEGFP-( VSV-G) [10*' cfu/ 
ml] and between 0. 1 to 1 with LNEFEGFP- 
( VSV-G) (10^ cfu/nil). Oocytes were cul- 
tured for 6 hours before fertilization by 
inlracyioplasniic sperm injection (ICSl). 
Vector particles incorporated into the oocyte 
in <4.5 hours as imaged by electron micros- 
copy (/^). Fifiy-scvcn percent (n - 126) of 
embryos developed beyond the four-cell 
stage and 40 embryos were transferred to 20 
surrogates, each carrying two embryos (Ta- 
ble 1). Rales for reproductive parameters are: 
fertilization (77% ICSI controls (15) versus 
75% transgenesis], embryonic development 
(75% ICSI controls (IS) versus 57% trans- 
genesis], and implantation [66% ICSI con- 
trols (/<5) versus 25% transgenesis]. Most 
control ICSI pregnancies result in live ofl- 
spring (83%) (/6). 

Five pregnancies resulted in the births of 
three healthy males (Table 1. Fig. 2). A set of 
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Table 1. 



Transgenesis efficiency in rhesus embryos, fetuses, and offspring. 



Construct 



VSV-G pseudotype 



LNCECFP 



Eggs injected with vector 
Eggs then injected with sperm 
Fertilization rate 

Embryonic development of fertilized eggs 

Embryos transferred (two/surrogate) 

Number of surrogates 

Pregnancies/surrogate 

Fetal losses 

Births 

Transgenic 

Transgenic birth/embryos transferred 
Transgenic birth/pregnancies 



LNEFEGFP 



Overall 



157 

157 
108 (69%) 
85 (79%) 

22 

n 
r (9%) 

2(100%) 
0 

2 of 2 
0 
0 



67 

65 
58 (89%) 
41 (71%) 

18 

9 

4 (44%) 
1 (25%) 

3 

1 of 4 
1 (5.5%) 
1 (25%) 



•Twin pregnancy. 

fraternal twins miscarried at 73 days (150 to 
1 55 days normal gestation) and a blighted preg- 
nancy (implantation attempt without a feius) 
also occiUTcd, One fetal twin of the miscarriage 
was an anatomically nonnal male, while the 
other was largely resorbed in utcro. The three 
births and the blighted pregnancy resulted from 
nine embryo transfers in which LNEFEGFP- 
( VSV-G) was used, whereas the twin pregnan- 
cy was established from 1 1 embryo transfers 
with LNCEGFP-( VSV-G) (Table I). 

Transgene integration, transcription, and 
expression from the newborns were exam- 
ined in hair, blood, umbilical cords, placen- 
tae, cultured lymphocytes, buccal epithelial 
cells, and urogenital cells piisscd in urine, along 
with 13 tissues from the male stillborn, nine 
from the resorbed one, and specimens from the 
blighted pregnancy (17). Polymerase chain re- 
action (PGR) was performed with primer sets 
that covered the flanking region of the vector 
pLNC-EGFP or pLNEF-EGFP and the GFP 



224 
222 
166(75%) 
126 (76%) 
40 . 
20 
5 (25%) 
3 (50%) 
3 

3of6 
1 (2.5%) 
1 (20%) 



gene (/S). One newbom, ANDi, showed the 
presence of the transgene in all analyzed tis- 
sues, and the transgene was present in all tissues 
analyzed from both stillbirths including placen- 
tae and testes (Fig. 3). Total RNA was extracted 
for suindard reverse transcription followed by 
PGR amplification (RT-PCR) with primer sets 
specific for the transgene (IS). Transgene tran- 
scription was demonstrated in all of the tissues 
in tlie fetuses and in the accessible tissues from 
the infant canying the transgene (Fig. 3). 

Southcm blot analysis of 10 tissues from 
the male stillbirth and eight samples from the 
other twin demonstrated multiple integration 
sites into their genomic DNA (Fig. 4) (19). 
Vector integration was determined by PGR of 
placenta, cord, blood, hair» and buccal cells 
using a primer set specific for the unique 
retroviral long terminal repeat (LTR) regions 
indicative of successful provims integration 
into the host genome (20, 21). This provims 
sequence was found in one infant and both 
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Fig. 3. PGR and RT-PCR analyses of transgenic and 
control tissues. (A) Thirteen tissues from an Intact 

PT DVr/^?''i"^i"''' '"^ ^'^ ««ues for 

RT-PCR. (C) Analysis of the male stillborn. Tissues 
from the reabsorbed fetus were collected from eight 
different regions to ensure broad representation 
because precise anatomical specification was limit- 
ed. (D through F) PGR, RT-PCR of the reabsorbed 
fetus. A total of seven samples were obtained from 
each offspring for PGR (G). two samples for RT-PCR 
(H) from ANDi" and one of the other two male 
offspring. (I) Analysis of the newboms. Indicates 

nf ri,DM A^' M ^ *?"^2f "'^ "^al^ with the presence 
of mRNA m all analyzed tissues. Co, cord; Bo blood- 
Ly, lymphocyte; Bu, buccal cells; Ur, urine; Ha, hair 
PI. placenta; Lu. lung; Li, liver; He, heart; In, intestine' 
Ki, kidney; Bl, bladder; Te. testis; Mu, muscle* Sk' 
skin; Ta, tail; Pa, pancreas; Sp, spleen; Tl = placenta 
from reabsorbed fetus; T2 toT9 = tissues retrieved 
from eight regions of the reabsorbed fetus; CI = 
nontransgenic rhesus tissue; C2 = CI + pLNG- 
EGFP; C3 = ddH^O; C4 = 293GP-LNCEGFP pack- 
aging cell; C5 = nontransgenic liver; C6 = trans- 
genic lung without DNase; C7 = transgenic lung 
without reverse transcription; C8 = CI + pLNEF- 
EGFP. ND, not determined. 
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Stillbirths (Fig. 4D). Infant welfare consider- 
ations limited tissue availability, and genom- 
ic DNA obtained was insufficient for South- 
ern analysis. The male infant with the insert- 
ed iransgcne has been named *'ANDi" (for 
"inserted DNA," in a reverse transcribed di- 
rection; Fig. 2A). 

GFP direct fluorescence in the toenails 
and hair of the fetus, as well as the placenta 
(Fig. 2, B through F), provided further evi- 
dence of transgenesis. Colocalization be- 
tween direct GFP fluorescence and indirect 
anii-GFP immunocytochemical imaging 
demonstrated that the GFP protein is found 
exclusively at the direct fluorescence sources 
(Fig. 2, D through F). Funhemiore, neither 
direct fluorescein nor indirect rhodaniinc flu- 
orescence was observed in controls (22). Be- 
cause tissues from the fetus originated from 
the three germ layers, the timing of transgene 
integration may have occurred before implan- 
tation, perhaps even before the first DNA 
replication cycle (10). The high efficiency of 
this approach has been linked to the absence 
of the nuclear envelope in oocytes naturally 
arrested in second meiotic melaphase (iO 
23), 

The miscarriage is likely due to the twin 
pregnancy, which is rare and high-risk in 
rhesus. The twin stillbirth originated from the 
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Fig. 4. (A) Southern blot 
analysis of Hind 111 (single 
digestion site) digested 
genomic DNA Full-length 
GFP labeled with [^P] 
was used as a probe to 
detect the transgene, 
which was detected in 
genomic DNA of the nor- 
mal male stillbirth (B) and 
reabsorbed fetus (C). 
Nontransgenic rhesus tis- 
sue was used as a nega- 
tive control (CI) and 
pLNC-EGFP DNA as a 
positive control (not 
shown). Various sized 
fragments were demon- 
strated in tissues ob- 
tained from each. This re- 
sult indicates multiple in- 
tegration sites due to the 
use of a restriction en- 
zyme with a single diges- 
tion site within the trans- 
gene. (D) Detection of the 
unique provirus sequence. 
A total of five tissues 
fronn each infant and two 
tissues from a male still- 
birth and the reabsorbed 
fetus were submitted for 
Thi!; 5^q".^"^,^^j:'3s detected in "ANDi" and the two stillbirths (42), which Indicates that 

^ ^ transgenic Abbreviations are the same as those In Fig. 3. Mu. muscle from the male 
stillborn; T3. tissue from the reabsorbed fetus 
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higher titer vector, whereas the three births, 
including the transgenic one, and the blighted 
pregnancy originated from the lower titer 
LNEFEGFP-( VSV-G) vector (I0« cfvJm\; 
Table I). Although only one live offspring is 
shown to be transgenic, we cannot yet ex- 
clude the possibility of transgenic mosaics in 
the others. We have neither demonstrated 
germline transmission nor the presence of 
transgenic sperm; this must await ANDi's 
development through puberty in about 4 
years. Vector titers and volume injected may 
play crucial roles in gene transfer efficiency. 
These offspring and their surrogates are now 
housed in dedicated facilities with ongoing, 
stringent monitoring. 

Nonhuman primates are invaluable models 
for advancing gene therapy treatments for dis- 
eases such as Parkinson's (24) and diabetes 
(2i), as well as ideal models for testing cell 
therapies (26) and vaccines, including those for 
HIV (27, 28). Although we have demonstrated 
iransgene introduction in rhesus monkeys, sig- 
nificant hurdles remain for the successful ho- 
mologous recombination essential for gene tar- 
geting (29), The molecular approaches for mak- 
ing clones [either by embryo splitting (30) or 
nuclear transfer (31-36)], utilizing stem cells 
(37-39), and now producing transgenic mon- 
keys, could be combined to produce the ideal 
models to accelerate discoveries and to bridge 
the scientific gap between transgenic mice and 
humans. 
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Categorical Representation of 
Visual Stimuli in the Primate 
Prefrontal Cortex 

David J. Freedman,^-2.5 Maximilian Riesenhuber.^'-*-^ 
Tomaso Poggio,^'^-^ Earl K. Miller"*-^'^* 

The ability to group stimuli into meaningful categories is a fundamental cop- 
nitive process. To explore its neural basis, we trained monkeys to categorize 
computer-generated stimuli as "cats" and "dogs." A morphing system was used 
tosystematicaUy vary stimulus shape and precisely define the category bound- 
ary. Neural activity In the lateral prefrontal cortex reflected the category of 
visual stimuli, even when a monkey was retrained with the stimuli assigned to 
new categories. ^ 



Categorization refers to the ability to react sim- 
ilarly to stimuli when they arc physically dis- 
tinct, and to react difTcrcnlly to stimuli that may 
be physically similar (/). For example, we rec- 
ognize an apple and a banana to be in the same 
category (food) even though they are dissimilar 
iri appearance, and wc consider an apple and a 
billiard ball to be in different categories even 
though they are similar in shape and sometimes 
color. Categorization is fundamental; our ravy 
perceptions would be useless without our clas- 
siflcation of items as furniture or food. Al- 
though a great deal is known about Ihe neural 
analysis of visual features, little is known about 
the neural basis of the categorical infomiation 
that gives them meaning. 
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In advanced animals, most categories are 
learned. Monkeys can Icam to categorize stim- 
uli as animal or non-animal (2), food or non- 
food (3), tree or non-tree, fish or non-fish (4), 
and by ordinal number (S). The neural correlate 
of such perceptual categories might be found in 
brain areas that process visual forai. The infe- 
rior temporal (IT) and prefrontal (PF) cortices 
are likely candidates; their neurons are sensitive 
to form and they are important for a wide 
range of visual behaviors (10-12). 

The hallmark of perceptual categorization is 
a sharp "boundary*' (IS). That is, stimuli from 
different categories that are similar in appear- 
ance (e.g., apple/billiard ball) are treated as 
different, whereas distinct stimuli within the 
same category (e.g., apple/banana) are treated 
alike. Presumably, there are neurons that also 
represent such sharp distinctions. This is diffi- 
cult to assess with a small subset of a large, 
amorphous category (e.g., food, human, etc)! 
Because the category boundary is unknown, it 
is unclear whether neural activity reflects cate- 
gory membership or physical similarity. 
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Generation of transgenic dairy cattle using 'in vitro' embryo 
production. 

Krimpenfort P, Rademakers A, Eyestone W, van der Schans A, van den 
Broek S, Kooiman P, Kootwijk E, Platenburg G, Pieper F, Stryker R. 

Department of Embryology, Gene Pharming Europe B.V., Leiden, The 
Netherlands. 

We have combined gene transfer, by microinjection, with 'in vitro" embryo 
production technology, enabling us to carry out non-surgical transfer, to 
recipient cows, of microinjected embryos that have been cultured from 
immature oocytes. Using this approach, we have established 21 pregnancies 
from which 19 calves were born. Southern blot analysis proved that in two cases 
the microinjected DNA had been integrated in the host genome. 

PMID: 1367358 [PubMed - indexed for MEDLINE] 
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Simons, J. Paul; Wllmut, Ian; Clark, A. John; Archibald, Alan 
L.; Bishop, John O.; Lathe, Richard 
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0733-222X. English. 

Six transgenic sheep were formed by microinjection of ONA into 
the pronuclei of single-cell eggs. Three DNA constructs were 
microinjected: (1) pMK, which contains the mouse 
metallothionein-1 (MT) promoter linked to the herpes simplex 
thymidine kinase gene nTK) (2) BLG-FIX. which contains the B- 
lactoglobulin gene (BLG) linked to cDNA sequences encoding 
for human blood-coagulation factor IX and (3) BLG-a1AT which 
contains gene BLG linked to cDNA sequences encoding human 
a1 -antitrypsin. The DNA in the transgenic sheep has not 
undergone rearrangement, as verified by hybridization assays 
Hybndization intensities revealed the presence of single and 
multiple copies of constructs in the 6 lambs. Multiple copies had 
head-to-head and head>to-tail tandem arrangements. One of the 
offspnng has the pMK construct, 4 of the offspring carry the 
BLG-FIX construct, and the last offspring carries the BLG-a1AT 
construct. Offspring from these transgenic sheep also carry the 
transgenic igenes. ' 
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Clark, A. J.; Bessos, H.; Bishop, J. O.; Brown P ■ Harris q • 
Lathe, R.; McClenaghan, M.; Prowse, C.; Scons'. J P ; et ai. 

0^3-1^^^^^^^^^ '^''^^ ^O^^N: BTCHDA; ISSN: 

Ir'^HM?r "'"^ livestock may prove, useful for the large scale 
production of valuable proteins. By targeting expression to the 

thi. .nHl?\"^j^^'^ harvested from mi k To 

this end, a hybrid gene was designed to direct the synthesis of 
human anti-hemophilic factor IX to the mammary gland and 
introduced m o sheep. Two transgenic ewes, each carrying 
about 10 cop.es of the foreign gene, have been ana^yzeTfor 
expression. Both animals express human factor IX RNA in the 
mammary gland and secrete the corresponding protein into their 
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